BoyerMooreSearch Class Reference

BoyerMooreSearch. More...

#include <bmsearch.h>

Inheritance diagram for BoyerMooreSearch:
UObject UMemory

Public Member Functions

 BoyerMooreSearch (CollData *theData, const UnicodeString &patternString, const UnicodeString *targetString, UErrorCode &status)
 Construct a BoyerMooreSearch object.
 ~BoyerMooreSearch ()
 The desstructor.
UBool empty ()
 Test the pattern to see if it generates any CEs.
UBool search (int32_t offset, int32_t &start, int32_t &end)
 Search for the pattern string in the target string.
void setTargetString (const UnicodeString *targetString, UErrorCode &status)
 Set the target string for the match.
CollDatagetData ()
 Return the CollData object used for searching.
CEListgetPatternCEs ()
 Return the CEs generated by the pattern string.
BadCharacterTable * getBadCharacterTable ()
 Return the BadCharacterTable object computed for the pattern string.
GoodSuffixTable * getGoodSuffixTable ()
 Return the GoodSuffixTable object computed for the pattern string.
virtual UClassID getDynamicClassID () const
 UObject glue.

Static Public Member Functions

static UClassID getStaticClassID ()
 UObject glue.

Detailed Description

BoyerMooreSearch.

This object holds the information needed to do a Collation sensitive Boyer-Moore search. It encapulates the pattern, the "bad character" and "good suffix" tables, the Collator-based data needed to compute them, and a reference to the text being searched.

To do a search, you fist need to get a CollData object by calling CollData::open. Then you construct a BoyerMooreSearch object from the CollData object, the pattern string and the target string. Then you call the search method. Here's a code sample:

 void boyerMooreExample(UCollator *collator, UnicodeString *pattern, UnicodeString *target)
 {
     UErrorCode status = U_ZERO_ERROR;
     CollData *collData = CollData::open(collator, status);
     if (U_FAILURE(status)) {
         // could not create a CollData object
         return;
     }
     BoyerMooreSearch *search = new BoyerMooreSearch(collData, *patternString, target, status);
     if (U_FAILURE(status)) {
         // could not create a BoyerMooreSearch object
         CollData::close(collData);
         return;
     }
     int32_t offset = 0, start = -1, end = -1;
     // Find all matches
     while (search->search(offset, start, end)) {
         // process the match between start and end
         ...
         // advance past the match
         offset = end; 
     }
     // at this point, if offset == 0, there were no matches
     if (offset == 0) {
         // handle the case of no matches
     }
     delete search;
     CollData::close(collData);
     // CollData objects are cached, so the call to
     // CollData::close doesn't delete the object.
     // Call this if you don't need the object any more.
     CollData::flushCollDataCache();
 }
 

NOTE: This is a technology preview. The final version of this API may not bear any resenblence to this API.

Knows linitations: 1) Backwards searching has not been implemented.

2) For Han and Hangul characters, this code ignores any Collation tailorings. In general, this isn't a problem, but in Korean locals, at strength 1, Hangul characters are tailored to be equal to Han characters with the same pronounciation. Because this code ignroes tailorings, searching for a Hangul character will not find a Han character and visa-versa.

3) In some cases, searching for a pattern that needs to be normalized and ends in a discontiguous contraction may fail. The only known cases of this are with the Tibetan script. For example searching for the pattern "\u0F7F\u0F80\u0F81\u0F82\u0F83\u0F84\u0F85" will fail. (This case is artificial. We've been unable to find a pratical, real-world example of this failure.)

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
See also:
CollData

Definition at line 107 of file bmsearch.h.


Constructor & Destructor Documentation

BoyerMooreSearch::BoyerMooreSearch ( CollData theData,
const UnicodeString patternString,
const UnicodeString targetString,
UErrorCode status 
)

Construct a BoyerMooreSearch object.

Parameters:
theData - A CollData object holding the Collator-sensitive data
patternString - the string for which to search
targetString - the string in which to search or NULL if youu will set it later by calling setTargetString.
status - will be set if any errors occur.

Note: if on return, status is set to an error code, the only safe thing to do with this object is to call the destructor.

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
BoyerMooreSearch::~BoyerMooreSearch (  ) 

The desstructor.

Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview

Member Function Documentation

UBool BoyerMooreSearch::empty (  ) 

Test the pattern to see if it generates any CEs.

Returns:
TRUE if the pattern string did not generate any CEs
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
BadCharacterTable* BoyerMooreSearch::getBadCharacterTable (  ) 

Return the BadCharacterTable object computed for the pattern string.

Returns:
the BadCharacterTable object.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
CollData* BoyerMooreSearch::getData (  ) 

Return the CollData object used for searching.

Returns:
the CollData object used for searching
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
virtual UClassID BoyerMooreSearch::getDynamicClassID (  )  const [virtual]

UObject glue.

..

Implements UObject.

GoodSuffixTable* BoyerMooreSearch::getGoodSuffixTable (  ) 

Return the GoodSuffixTable object computed for the pattern string.

Returns:
the GoodSuffixTable object computed for the pattern string.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
CEList* BoyerMooreSearch::getPatternCEs (  ) 

Return the CEs generated by the pattern string.

Returns:
a CEList object holding the CEs generated by the pattern string.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
static UClassID BoyerMooreSearch::getStaticClassID (  )  [static]

UObject glue.

..

UBool BoyerMooreSearch::search ( int32_t  offset,
int32_t &  start,
int32_t &  end 
)

Search for the pattern string in the target string.

Parameters:
offset - the offset in the target string at which to begin the search
start - will be set to the starting offset of the match, or -1 if there's no match
end - will be set to the ending offset of the match, or -1 if there's no match
Returns:
TRUE if the match succeeds, FALSE otherwise.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview
void BoyerMooreSearch::setTargetString ( const UnicodeString targetString,
UErrorCode status 
)

Set the target string for the match.

Parameters:
targetString - the new target string
status - will be set if any errors occur.
Internal:
Do not use. This API is for internal use only. ICU 4.0.1 technology preview

The documentation for this class was generated from the following file:
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Friends Defines

Generated on Sat Jan 23 15:17:41 2010 for ICU 4.3.4 by  doxygen 1.6.1