ICU 59.1  59.1
Data Structures | Macros | Typedefs | Enumerations | Functions
ucnv_err.h File Reference

C UConverter predefined error callbacks. More...

#include "unicode/utypes.h"

Go to the source code of this file.

Data Structures

struct  UConverterFromUnicodeArgs
 The structure for the fromUnicode callback function parameter. More...
 
struct  UConverterToUnicodeArgs
 The structure for the toUnicode callback function parameter. More...
 

Macros

#define UCNV_SUB_STOP_ON_ILLEGAL   "i"
 FROM_U, TO_U context options for sub callback. More...
 
#define UCNV_SKIP_STOP_ON_ILLEGAL   "i"
 FROM_U, TO_U context options for skip callback. More...
 
#define UCNV_ESCAPE_ICU   NULL
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to ICU (UXXXX) More...
 
#define UCNV_ESCAPE_JAVA   "J"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to JAVA (\uXXXX) More...
 
#define UCNV_ESCAPE_C   "C"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to C (\uXXXX \UXXXXXXXX) TO_U_CALLBACK_ESCAPE option to escape the character value accoding to C (\xXXXX) More...
 
#define UCNV_ESCAPE_XML_DEC   "D"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Decimal escape (&#DDDD;) TO_U_CALLBACK_ESCAPE context option to escape the character value accoding to XML Decimal escape (&#DDDD;). More...
 
#define UCNV_ESCAPE_XML_HEX   "X"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Hex escape (&#xXXXX;) TO_U_CALLBACK_ESCAPE context option to escape the character value accoding to XML Hex escape (&#xXXXX;). More...
 
#define UCNV_ESCAPE_UNICODE   "U"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX) More...
 
#define UCNV_ESCAPE_CSS2   "S"
 FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to CSS2 conventions (\HH..H<space>, that is, a backslash, 1..6 hex digits, and a space) More...
 

Typedefs

typedef struct UConverter UConverter
 

Enumerations

enum  UConverterCallbackReason {
  UCNV_UNASSIGNED = 0, UCNV_ILLEGAL = 1, UCNV_IRREGULAR = 2, UCNV_RESET = 3,
  UCNV_CLOSE = 4, UCNV_CLONE = 5
}
 The process condition code to be used with the callbacks. More...
 

Functions

void UCNV_FROM_U_CALLBACK_STOP (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. More...
 
void UCNV_TO_U_CALLBACK_STOP (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. More...
 
void UCNV_FROM_U_CALLBACK_SKIP (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSINGED_SEQUENCE depending on the context parameter simply ignoring those characters. More...
 
void UCNV_FROM_U_CALLBACK_SUBSTITUTE (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE, or UNASSIGNED_SEQUENCE depending on context parameter, with the current substitution string for the converter. More...
 
void UCNV_FROM_U_CALLBACK_ESCAPE (const void *context, UConverterFromUnicodeArgs *fromUArgs, const UChar *codeUnits, int32_t length, UChar32 codePoint, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal codepoints. More...
 
void UCNV_TO_U_CALLBACK_SKIP (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSINGED_SEQUENCE depending on the context parameter simply ignoring those characters. More...
 
void UCNV_TO_U_CALLBACK_SUBSTITUTE (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE,or UNASSIGNED_SEQUENCE depending on context parameter, with the Unicode substitution character, U+FFFD. More...
 
void UCNV_TO_U_CALLBACK_ESCAPE (const void *context, UConverterToUnicodeArgs *toUArgs, const char *codeUnits, int32_t length, UConverterCallbackReason reason, UErrorCode *err)
 DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal bytes (in the format XNN, e.g. More...
 

Detailed Description

C UConverter predefined error callbacks.

Error Behaviour Functions

Defines some error behaviour functions called by ucnv_{from,to}Unicode These are provided as part of ICU and many are stable, but they can also be considered only as an example of what can be done with callbacks. You may of course write your own.

If you want to write your own, you may also find the functions from ucnv_cb.h useful when writing your own callbacks.

These functions, although public, should NEVER be called directly. They should be used as parameters to the ucnv_setFromUCallback and ucnv_setToUCallback functions, to set the behaviour of a converter when it encounters ILLEGAL/UNMAPPED/INVALID sequences.

usage example: 'STOP' doesn't need any context, but newContext could be set to something other than 'NULL' if needed. The available contexts in this header can modify the default behavior of the callback.

UConverter *myConverter = ucnv_open("ibm-949", &err);
const void *oldContext;
if (U_SUCCESS(err))
{
ucnv_setFromUCallBack(myConverter,
&oldAction,
&oldContext,
&status);
}

The code above tells "myConverter" to stop when it encounters an ILLEGAL/TRUNCATED/INVALID sequences when it is used to convert from Unicode -> Codepage. The behavior from Codepage to Unicode is not changed, and ucnv_setToUCallBack would need to be called in order to change that behavior too.

Here is an example with a context:

UConverter *myConverter = ucnv_open("ibm-949", &err);
const void *oldContext;
if (U_SUCCESS(err))
{
ucnv_setToUCallBack(myConverter,
&oldAction,
&oldContext,
&status);
}

The code above tells "myConverter" to stop when it encounters an ILLEGAL/TRUNCATED/INVALID sequences when it is used to convert from Codepage -> Unicode. Any unmapped and legal characters will be substituted to be the default substitution character.

Definition in file ucnv_err.h.

Macro Definition Documentation

§ UCNV_ESCAPE_C

#define UCNV_ESCAPE_C   "C"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to C (\uXXXX \UXXXXXXXX) TO_U_CALLBACK_ESCAPE option to escape the character value accoding to C (\xXXXX)

Stable:
ICU 2.0

Definition at line 125 of file ucnv_err.h.

§ UCNV_ESCAPE_CSS2

#define UCNV_ESCAPE_CSS2   "S"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to CSS2 conventions (\HH..H<space>, that is, a backslash, 1..6 hex digits, and a space)

Stable:
ICU 4.0

Definition at line 149 of file ucnv_err.h.

§ UCNV_ESCAPE_ICU

#define UCNV_ESCAPE_ICU   NULL

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to ICU (UXXXX)

Stable:
ICU 2.0

Definition at line 114 of file ucnv_err.h.

§ UCNV_ESCAPE_JAVA

#define UCNV_ESCAPE_JAVA   "J"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to JAVA (\uXXXX)

Stable:
ICU 2.0

Definition at line 119 of file ucnv_err.h.

§ UCNV_ESCAPE_UNICODE

#define UCNV_ESCAPE_UNICODE   "U"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to Unicode (U+XXXXX)

Stable:
ICU 2.0

Definition at line 142 of file ucnv_err.h.

§ UCNV_ESCAPE_XML_DEC

#define UCNV_ESCAPE_XML_DEC   "D"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Decimal escape (&#DDDD;) TO_U_CALLBACK_ESCAPE context option to escape the character value accoding to XML Decimal escape (&#DDDD;).

Stable:
ICU 2.0

Definition at line 131 of file ucnv_err.h.

§ UCNV_ESCAPE_XML_HEX

#define UCNV_ESCAPE_XML_HEX   "X"

FROM_U_CALLBACK_ESCAPE context option to escape the code unit according to XML Hex escape (&#xXXXX;) TO_U_CALLBACK_ESCAPE context option to escape the character value accoding to XML Hex escape (&#xXXXX;).

Stable:
ICU 2.0

Definition at line 137 of file ucnv_err.h.

§ UCNV_SKIP_STOP_ON_ILLEGAL

#define UCNV_SKIP_STOP_ON_ILLEGAL   "i"

FROM_U, TO_U context options for skip callback.

Stable:
ICU 2.0

Definition at line 108 of file ucnv_err.h.

§ UCNV_SUB_STOP_ON_ILLEGAL

#define UCNV_SUB_STOP_ON_ILLEGAL   "i"

FROM_U, TO_U context options for sub callback.

Stable:
ICU 2.0

Definition at line 102 of file ucnv_err.h.

Typedef Documentation

§ UConverter

typedef struct UConverter UConverter
Stable:
ICU 2.0

Definition at line 96 of file ucnv_err.h.

Enumeration Type Documentation

§ UConverterCallbackReason

The process condition code to be used with the callbacks.

Codes which are greater than UCNV_IRREGULAR should be passed on to any chained callbacks.

Stable:
ICU 2.0
Enumerator
UCNV_UNASSIGNED 

The code point is unassigned.

The error code U_INVALID_CHAR_FOUND will be set.

UCNV_ILLEGAL 

The code point is illegal.

For example, \x81\x2E is illegal in SJIS because \x2E is not a valid trail byte for the \x81 lead byte. Also, starting with Unicode 3.0.1, non-shortest byte sequences in UTF-8 (like \xC1\xA1 instead of \x61 for U+0061) are also illegal, not just irregular. The error code U_ILLEGAL_CHAR_FOUND will be set.

UCNV_IRREGULAR 

The codepoint is not a regular sequence in the encoding.

For example, \xED\xA0\x80..\xED\xBF\xBF are irregular UTF-8 byte sequences for single surrogate code points. The error code U_INVALID_CHAR_FOUND will be set.

UCNV_RESET 

The callback is called with this reason when a 'reset' has occured.

Callback should reset all state.

UCNV_CLOSE 

Called when the converter is closed.

The callback should release any allocated memory.

UCNV_CLONE 

Called when ucnv_safeClone() is called on the converter.

the pointer available as the 'context' is an alias to the original converters' context pointer. If the context must be owned by the new converter, the callback must clone the data and call ucnv_setFromUCallback (or setToUCallback) with the correct pointer.

Stable:
ICU 2.2

Definition at line 157 of file ucnv_err.h.

Function Documentation

§ UCNV_FROM_U_CALLBACK_ESCAPE()

void UCNV_FROM_U_CALLBACK_ESCAPE ( const void *  context,
UConverterFromUnicodeArgs fromUArgs,
const UChar codeUnits,
int32_t  length,
UChar32  codePoint,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal codepoints.

Parameters
contextThe function currently recognizes the callback options:
  • UCNV_ESCAPE_ICU: Substitues the ILLEGAL SEQUENCE with the hexadecimal representation in the format UXXXX, e.g. "%uFFFE%u00AC%uC8FE"). In the Event the converter doesn't support the characters {%,U}[A-F][0-9], it will substitute the illegal sequence with the substitution characters. Note that codeUnit(32bit int eg: unit of a surrogate pair) is represented as UD84DUDC56
  • UCNV_ESCAPE_JAVA: Substitues the ILLEGAL SEQUENCE with the hexadecimal representation in the format \uXXXX, e.g. "\\uFFFE\\u00AC\\uC8FE"). In the Event the converter doesn't support the characters {\,u}[A-F][0-9], it will substitute the illegal sequence with the substitution characters. Note that codeUnit(32bit int eg: unit of a surrogate pair) is represented as \uD84D\uDC56
  • UCNV_ESCAPE_C: Substitues the ILLEGAL SEQUENCE with the hexadecimal representation in the format \uXXXX, e.g. "\\uFFFE\\u00AC\\uC8FE"). In the Event the converter doesn't support the characters {\,u,U}[A-F][0-9], it will substitute the illegal sequence with the substitution characters. Note that codeUnit(32bit int eg: unit of a surrogate pair) is represented as \U00023456
  • UCNV_ESCAPE_XML_DEC: Substitues the ILLEGAL SEQUENCE with the decimal representation in the format &#DDDDDDDD;, e.g. "&#65534;&#172;&#51454;"). In the Event the converter doesn't support the characters {&,#}[0-9], it will substitute the illegal sequence with the substitution characters. Note that codeUnit(32bit int eg: unit of a surrogate pair) is represented as &#144470; and Zero padding is ignored.
  • UCNV_ESCAPE_XML_HEX:Substitues the ILLEGAL SEQUENCE with the decimal representation in the format &#xXXXX; e.g. "&#xFFFE;&#x00AC;&#xC8FE;"). In the Event the converter doesn't support the characters {&,#,x}[0-9], it will substitute the illegal sequence with the substitution characters. Note that codeUnit(32bit int eg: unit of a surrogate pair) is represented as &#x23456;
fromUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' UChars of the concerned Unicode sequence
lengthSize (in bytes) of the concerned codepage sequence
codePointSingle UChar32 (UTF-32) containing the concerend Unicode codepoint.
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
Stable:
ICU 2.0

§ UCNV_FROM_U_CALLBACK_SKIP()

void UCNV_FROM_U_CALLBACK_SKIP ( const void *  context,
UConverterFromUnicodeArgs fromUArgs,
const UChar codeUnits,
int32_t  length,
UChar32  codePoint,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSINGED_SEQUENCE depending on the context parameter simply ignoring those characters.

Parameters
contextThe function currently recognizes the callback options: UCNV_SKIP_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Skips any ILLEGAL_SEQUENCE
fromUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' UChars of the concerned Unicode sequence
lengthSize (in bytes) of the concerned codepage sequence
codePointSingle UChar32 (UTF-32) containing the concerend Unicode codepoint.
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
Stable:
ICU 2.0

§ UCNV_FROM_U_CALLBACK_STOP()

void UCNV_FROM_U_CALLBACK_STOP ( const void *  context,
UConverterFromUnicodeArgs fromUArgs,
const UChar codeUnits,
int32_t  length,
UChar32  codePoint,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately.

Parameters
contextPointer to the callback's private data
fromUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' UChars of the concerned Unicode sequence
lengthSize (in bytes) of the concerned codepage sequence
codePointSingle UChar32 (UTF-32) containing the concerend Unicode codepoint.
reasonDefines the reason the callback was invoked
errThis should always be set to a failure status prior to calling.
Stable:
ICU 2.0

§ UCNV_FROM_U_CALLBACK_SUBSTITUTE()

void UCNV_FROM_U_CALLBACK_SUBSTITUTE ( const void *  context,
UConverterFromUnicodeArgs fromUArgs,
const UChar codeUnits,
int32_t  length,
UChar32  codePoint,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This From Unicode callback will Substitute the ILLEGAL SEQUENCE, or UNASSIGNED_SEQUENCE depending on context parameter, with the current substitution string for the converter.

This is the default callback.

Parameters
contextThe function currently recognizes the callback options: UCNV_SUB_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Substitutes any ILLEGAL_SEQUENCE
fromUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' UChars of the concerned Unicode sequence
lengthSize (in bytes) of the concerned codepage sequence
codePointSingle UChar32 (UTF-32) containing the concerend Unicode codepoint.
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
See also
ucnv_setSubstChars
Stable:
ICU 2.0

§ UCNV_TO_U_CALLBACK_ESCAPE()

void UCNV_TO_U_CALLBACK_ESCAPE ( const void *  context,
UConverterToUnicodeArgs toUArgs,
const char *  codeUnits,
int32_t  length,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE with the hexadecimal representation of the illegal bytes (in the format XNN, e.g.

"%XFF%X0A%XC8%X03").

Parameters
contextThis function currently recognizes the callback options: UCNV_ESCAPE_ICU, UCNV_ESCAPE_JAVA, UCNV_ESCAPE_C, UCNV_ESCAPE_XML_DEC, UCNV_ESCAPE_XML_HEX and UCNV_ESCAPE_UNICODE.
toUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' bytes of the concerned codepage sequence
lengthSize (in bytes) of the concerned codepage sequence
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
Stable:
ICU 2.0

§ UCNV_TO_U_CALLBACK_SKIP()

void UCNV_TO_U_CALLBACK_SKIP ( const void *  context,
UConverterToUnicodeArgs toUArgs,
const char *  codeUnits,
int32_t  length,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback skips any ILLEGAL_SEQUENCE, or skips only UNASSINGED_SEQUENCE depending on the context parameter simply ignoring those characters.

Parameters
contextThe function currently recognizes the callback options: UCNV_SKIP_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Skips any ILLEGAL_SEQUENCE
toUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' bytes of the concerned codepage sequence
lengthSize (in bytes) of the concerned codepage sequence
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
Stable:
ICU 2.0

§ UCNV_TO_U_CALLBACK_STOP()

void UCNV_TO_U_CALLBACK_STOP ( const void *  context,
UConverterToUnicodeArgs toUArgs,
const char *  codeUnits,
int32_t  length,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately.

Parameters
contextPointer to the callback's private data
toUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' bytes of the concerned codepage sequence
lengthSize (in bytes) of the concerned codepage sequence
reasonDefines the reason the callback was invoked
errThis should always be set to a failure status prior to calling.
Stable:
ICU 2.0

§ UCNV_TO_U_CALLBACK_SUBSTITUTE()

void UCNV_TO_U_CALLBACK_SUBSTITUTE ( const void *  context,
UConverterToUnicodeArgs toUArgs,
const char *  codeUnits,
int32_t  length,
UConverterCallbackReason  reason,
UErrorCode err 
)

DO NOT CALL THIS FUNCTION DIRECTLY! This To Unicode callback will Substitute the ILLEGAL SEQUENCE,or UNASSIGNED_SEQUENCE depending on context parameter, with the Unicode substitution character, U+FFFD.

Parameters
contextThe function currently recognizes the callback options: UCNV_SUB_STOP_ON_ILLEGAL: STOPS at the ILLEGAL_SEQUENCE, returning the error code back to the caller immediately. NULL: Substitutes any ILLEGAL_SEQUENCE
toUArgsInformation about the conversion in progress
codeUnitsPoints to 'length' bytes of the concerned codepage sequence
lengthSize (in bytes) of the concerned codepage sequence
reasonDefines the reason the callback was invoked
errReturn value will be set to success if the callback was handled, otherwise this value will be set to a failure status.
Stable:
ICU 2.0