ucasemap.h File Reference

C API: Unicode case mapping functions using a UCaseMap service object. More...

#include "unicode/utypes.h"
#include "unicode/ustring.h"
#include "unicode/localpointer.h"

Go to the source code of this file.

Defines

#define U_TITLECASE_NO_LOWERCASE   0x100
 Do not lowercase non-initial parts of words when titlecasing.
#define U_TITLECASE_NO_BREAK_ADJUSTMENT   0x200
 Do not adjust the titlecasing indexes from BreakIterator::next() indexes; titlecase exactly the characters at breaks from the iterator.

Typedefs

typedef struct UCaseMap UCaseMap
 C typedef for struct UCaseMap.

Functions

UCaseMapucasemap_open (const char *locale, uint32_t options, UErrorCode *pErrorCode)
 Open a UCaseMap service object for a locale and a set of options.
void ucasemap_close (UCaseMap *csm)
 Close a UCaseMap service object.
const char * ucasemap_getLocale (const UCaseMap *csm)
 Get the locale ID that is used for language-dependent case mappings.
uint32_t ucasemap_getOptions (const UCaseMap *csm)
 Get the options bit set that is used for case folding and string comparisons.
void ucasemap_setLocale (UCaseMap *csm, const char *locale, UErrorCode *pErrorCode)
 Set the locale ID that is used for language-dependent case mappings.
void ucasemap_setOptions (UCaseMap *csm, uint32_t options, UErrorCode *pErrorCode)
 Set the options bit set that is used for case folding and string comparisons.
const UBreakIteratorucasemap_getBreakIterator (const UCaseMap *csm)
 Get the break iterator that is used for titlecasing.
void ucasemap_setBreakIterator (UCaseMap *csm, UBreakIterator *iterToAdopt, UErrorCode *pErrorCode)
 Set the break iterator that is used for titlecasing.
int32_t ucasemap_toTitle (UCaseMap *csm, UChar *dest, int32_t destCapacity, const UChar *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-16 string.
int32_t ucasemap_utf8ToLower (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Lowercase the characters in a UTF-8 string.
int32_t ucasemap_utf8ToUpper (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Uppercase the characters in a UTF-8 string.
int32_t ucasemap_utf8ToTitle (UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Titlecase a UTF-8 string.
int32_t ucasemap_utf8FoldCase (const UCaseMap *csm, char *dest, int32_t destCapacity, const char *src, int32_t srcLength, UErrorCode *pErrorCode)
 Case-fold the characters in a UTF-8 string.

Detailed Description

C API: Unicode case mapping functions using a UCaseMap service object.

The service object takes care of memory allocations, data loading, and setup for the attributes, as usual.

Currently, the functionality provided here does not overlap with uchar.h and ustring.h, except for ucasemap_toTitle().

ucasemap_utf8XYZ() functions operate directly on UTF-8 strings.

Definition in file ucasemap.h.


Define Documentation

#define U_TITLECASE_NO_BREAK_ADJUSTMENT   0x200

Do not adjust the titlecasing indexes from BreakIterator::next() indexes; titlecase exactly the characters at breaks from the iterator.

Option bit for titlecasing APIs that take an options bit set.

By default, titlecasing will take each break iterator index, adjust it by looking for the next cased character, and titlecase that one. Other characters are lowercased.

This follows Unicode 4 & 5 section 3.13 Default Case Operations:

R3 toTitlecase(X): Find the word boundaries based on Unicode Standard Annex #29, "Text Boundaries." Between each pair of word boundaries, find the first cased character F. If F exists, map F to default_title(F); then map each subsequent character C to default_lower(C).

See also:
ucasemap_setOptions
ucasemap_toTitle
ucasemap_utf8ToTitle
UnicodeString::toTitle
U_TITLECASE_NO_LOWERCASE
Stable:
ICU 3.8

Definition at line 184 of file ucasemap.h.

#define U_TITLECASE_NO_LOWERCASE   0x100

Do not lowercase non-initial parts of words when titlecasing.

Option bit for titlecasing APIs that take an options bit set.

By default, titlecasing will titlecase the first cased character of a word and lowercase all other characters. With this option, the other characters will not be modified.

See also:
ucasemap_setOptions
ucasemap_toTitle
ucasemap_utf8ToTitle
UnicodeString::toTitle
Stable:
ICU 3.8

Definition at line 159 of file ucasemap.h.


Typedef Documentation

typedef struct UCaseMap UCaseMap

C typedef for struct UCaseMap.

Stable:
ICU 3.4

Definition at line 45 of file ucasemap.h.


Function Documentation

void ucasemap_close ( UCaseMap csm  ) 

Close a UCaseMap service object.

Parameters:
csm Object to be closed.
Stable:
ICU 3.4
const UBreakIterator* ucasemap_getBreakIterator ( const UCaseMap csm  ) 

Get the break iterator that is used for titlecasing.

Do not modify the returned break iterator.

Parameters:
csm UCaseMap service object.
Returns:
titlecasing break iterator
Stable:
ICU 3.8
const char* ucasemap_getLocale ( const UCaseMap csm  ) 

Get the locale ID that is used for language-dependent case mappings.

Parameters:
csm UCaseMap service object.
Returns:
locale ID
Stable:
ICU 3.4
uint32_t ucasemap_getOptions ( const UCaseMap csm  ) 

Get the options bit set that is used for case folding and string comparisons.

Parameters:
csm UCaseMap service object.
Returns:
options bit set
Stable:
ICU 3.4
UCaseMap* ucasemap_open ( const char *  locale,
uint32_t  options,
UErrorCode pErrorCode 
)

Open a UCaseMap service object for a locale and a set of options.

The locale ID and options are preprocessed so that functions using the service object need not process them in each call.

Parameters:
locale ICU locale ID, used for language-dependent upper-/lower-/title-casing according to the Unicode standard. Usual semantics: ""=root, NULL=default locale, etc.
options Options bit set, used for case folding and string comparisons. Same flags as for u_foldCase(), u_strFoldCase(), u_strCaseCompare(), etc. Use 0 or U_FOLD_CASE_DEFAULT for default behavior.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
Pointer to a UCaseMap service object, if successful.
See also:
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.4
void ucasemap_setBreakIterator ( UCaseMap csm,
UBreakIterator iterToAdopt,
UErrorCode pErrorCode 
)

Set the break iterator that is used for titlecasing.

The UCaseMap service object releases a previously set break iterator and "adopts" this new one, taking ownership of it. It will be released in a subsequent call to ucasemap_setBreakIterator() or ucasemap_close().

Break iterator operations are not thread-safe. Therefore, titlecasing functions use non-const UCaseMap objects. It is not possible to titlecase strings concurrently using the same UCaseMap.

Parameters:
csm UCaseMap service object.
iterToAdopt Break iterator to be adopted for titlecasing.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also:
ucasemap_toTitle
ucasemap_utf8ToTitle
Stable:
ICU 3.8
void ucasemap_setLocale ( UCaseMap csm,
const char *  locale,
UErrorCode pErrorCode 
)

Set the locale ID that is used for language-dependent case mappings.

Parameters:
csm UCaseMap service object.
locale Locale ID, see ucasemap_open().
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also:
ucasemap_open
Stable:
ICU 3.4
void ucasemap_setOptions ( UCaseMap csm,
uint32_t  options,
UErrorCode pErrorCode 
)

Set the options bit set that is used for case folding and string comparisons.

Parameters:
csm UCaseMap service object.
options Options bit set, see ucasemap_open().
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
See also:
ucasemap_open
Stable:
ICU 3.4
int32_t ucasemap_toTitle ( UCaseMap csm,
UChar dest,
int32_t  destCapacity,
const UChar src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-16 string.

This function is almost a duplicate of u_strToTitle(), except that it takes ucasemap_setOptions() into account and has performance advantages from being able to use a UCaseMap object for multiple case mapping operations, saving setup time.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
csm UCaseMap service object. This pointer is non-const! See the note above for details.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also:
u_strToTitle
Stable:
ICU 3.8
int32_t ucasemap_utf8FoldCase ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Case-fold the characters in a UTF-8 string.

Case-folding is locale-independent and not context-sensitive, but there is an option for whether to include or exclude mappings for dotted I and dotless i that are marked with 'I' in CaseFolding.txt. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
csm UCaseMap service object.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also:
u_strFoldCase
ucasemap_setOptions
U_FOLD_CASE_DEFAULT
U_FOLD_CASE_EXCLUDE_SPECIAL_I
Stable:
ICU 3.8
int32_t ucasemap_utf8ToLower ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Lowercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
csm UCaseMap service object.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also:
u_strToLower
Stable:
ICU 3.4
int32_t ucasemap_utf8ToTitle ( UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Titlecase a UTF-8 string.

Casing is locale-dependent and context-sensitive. Titlecasing uses a break iterator to find the first characters of words that are to be titlecased. It titlecases those characters and lowercases all others. (This can be modified with ucasemap_setOptions().)

Note: This function takes a non-const UCaseMap pointer because it will open a default break iterator if no break iterator was set yet, and effectively call ucasemap_setBreakIterator(); also because the break iterator is stateful and will be modified during the iteration.

The titlecase break iterator can be provided to customize for arbitrary styles, using rules and dictionaries beyond the standard iterators. The standard titlecase iterator for the root locale implements the algorithm of Unicode TR 21.

This function uses only the setUText(), first(), next() and close() methods of the provided break iterator.

The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
csm UCaseMap service object. This pointer is non-const! See the note above for details.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also:
u_strToTitle
U_TITLECASE_NO_LOWERCASE
U_TITLECASE_NO_BREAK_ADJUSTMENT
Stable:
ICU 3.8
int32_t ucasemap_utf8ToUpper ( const UCaseMap csm,
char *  dest,
int32_t  destCapacity,
const char *  src,
int32_t  srcLength,
UErrorCode pErrorCode 
)

Uppercase the characters in a UTF-8 string.

Casing is locale-dependent and context-sensitive. The result may be longer or shorter than the original. The source string and the destination buffer must not overlap.

Parameters:
csm UCaseMap service object.
dest A buffer for the result string. The result will be NUL-terminated if the buffer is large enough. The contents is undefined in case of failure.
destCapacity The size of the buffer (number of bytes). If it is 0, then dest may be NULL and the function will only return the length of the result without writing any of the result string.
src The original string.
srcLength The length of the original string. If -1, then src must be NUL-terminated.
pErrorCode Must be a valid pointer to an error code value, which must not indicate a failure before the function call.
Returns:
The length of the result string, if successful - or in case of a buffer overflow, in which case it will be greater than destCapacity.
See also:
u_strToUpper
Stable:
ICU 3.4
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Friends Defines

Generated on Sat Jan 23 15:17:36 2010 for ICU 4.3.4 by  doxygen 1.6.1