Transforms provide a general-purpose package for processing Unicode text. They are a powerful and flexible mechanism for handling a variety of different tasks, including:
Here are some examples of transforms:
Input | Transform | Output |
キャンパス |
Katakana-Latin |
kyanpasu |
Αλφαβητικός Κατάλογος |
Greek-Latin |
Alphabêtikós Katálogos |
биологическом |
Cyrillic-Latin |
biologichyeskom |
For more information about transforms, please see the Transforms section of the ICU User Guide.
This demo transforms text entered into the Input text area and displays it in Output 1, as specified by the controls labeled Source 1, Target 1, and Variant 1. If Source 1 is set to "(Compound)", then the transform is specified by the contents of Compound 1.
In a similar fashion, the contents of Output 1 are again transformed to produce the contents of Output 2, according to the second set of control inputs, labeled Source 2, Target 2, Variant 2, and Compound 2.
In the Source 2 menu another option is available: "(Inverse)". Selecting this option causes the second transform step (Output 1 to Output 2) to use the inverse of the first transform step (Input to Output 1).
Each set of controls contains a pop-up menu of standard ICU system transforms. Alternatively the special pop-up option "(Compound)" may be selected and a string may be typed into the Compound text field under the pop-up. The compound field can be used to create compound transforms or to use filters. Compound transforms comprise multiple transforms separated by semicolons, such as "Hangul-Latin; Latin-Greek". This will first transform the text from Hangul to Latin characters, then take the result and convert it to Greek.
Filters cause the transforms to only apply to specified characters. For example, "[a-m]Latin-Greek" will convert only the lowercase letters a through m; "[^ -~]Unicode-Hex" will hexify characters except for ASCII. Here are example filters:
Filter Pattern | Description |
[axzw] | The characters a, x, z, w |
[x-z] | The characters x, y, z |
[:Lowercase Letter:] | Lowercase letters |
[[:Uppercase Letter:][a-m]] | Uppercase letters plus a-m |
[[:Uppercase Letter:]-[A-M]] |
Uppercase letters minus A-M |
For more information see http://icu-project.org/apiref/icu4c/classUnicodeSet.html#_details under "Pattern Syntax".
For more information on properties (such as "Uppercase Letter") see the Properties section of the ICU User Guide. Note that properties may be abbreviated. For example, "Uppercase Letter" may be abbreviated as "Lu". Abbreviations are given in the Unicode files PropertyAliases.txt and PropertyValueAliases.txt.
Select a sample from the Insert Samples menu. You'll see some sample text in different scripts in the Input area. Now press the Transform button. You'll see the same text, transformed, in the Output 1 area, as a result of the first transform (the specific transform depends on which sample is selected). In the Output 2 area, you'll see the text transformed again as a result of the second transform.
The samples links point to sample text in various scripts on the Unicode website. This text can be pasted into the Input area to see the effect of various transforms on actual text in different scripts.
The demo displays characters using UTF-8. Your browser must be configured properly to see UTF-8 characters. If you have any difficulties, please refer to the display problems page on the Unicode website.
This demo has only been tested on Internet Explorer.
If you encounter any problems, please file a bug in the ICU bug database. For general questions or comments please send email to one of our mailing lists.