Skip to main content

International Components for Unicode

Transform Demo Help

Transforms provide a general-purpose package for processing Unicode text. They are a powerful and flexible mechanism for handling a variety of different tasks, including:

  • Uppercase, Lowercase, Titlecase, Full/Halfwidth conversions
  • Normalization
  • Hex and Character Name conversions
  • Script to Script conversion

Here are some examples of transforms:

Input Transform Output

キャンパス

Katakana-Latin

kyanpasu

Αλφαβητικός Κατάλογος

Greek-Latin

Alphabêtikós Katálogos

биологическом

Cyrillic-Latin

biologichyeskom

For more information about transforms, please see the Transforms section of the ICU User Guide.

How To Use This Demo

  1. Enter some text into the Input text area.
  2. Choose a transform using the Source 1/Target 1/Variant 1 controls, or enter a transform name in the Compound 1 text area.
  3. Press the Transform button and view the result in the Output 1 text area.
  4. Optionally, choose a second transform using the second set of controls to apply another transform to the Output 1 text. View the result in the Output 2 text area.

This demo transforms text entered into the Input text area and displays it in Output 1, as specified by the controls labeled Source 1, Target 1, and Variant 1. If Source 1 is set to "(Compound)", then the transform is specified by the contents of Compound 1.

In a similar fashion, the contents of Output 1 are again transformed to produce the contents of Output 2, according to the second set of control inputs, labeled Source 2, Target 2, Variant 2, and Compound 2.

In the Source 2 menu another option is available: "(Inverse)". Selecting this option causes the second transform step (Output 1 to Output 2) to use the inverse of the first transform step (Input to Output 1).

Specifying a Transform

Each set of controls contains a pop-up menu of standard ICU system transforms. Alternatively the special pop-up option "(Compound)" may be selected and a string may be typed into the Compound text field under the pop-up. The compound field can be used to create compound transforms or to use filters. Compound transforms comprise multiple transforms separated by semicolons, such as "Hangul-Latin; Latin-Greek". This will first transform the text from Hangul to Latin characters, then take the result and convert it to Greek.

Filters cause the transforms to only apply to specified characters. For example, "[a-m]Latin-Greek" will convert only the lowercase letters a through m; "[^ -~]Unicode-Hex" will hexify characters except for ASCII. Here are example filters:

Filter Pattern Description
[axzw] The characters a, x, z, w
[x-z] The characters x, y, z
[:Lowercase Letter:] Lowercase letters
[[:Uppercase Letter:][a-m]] Uppercase letters plus a-m

[[:Uppercase Letter:]-[A-M]]

Uppercase letters minus A-M

For more information see http://icu-project.org/apiref/icu4c/classUnicodeSet.html#_details under "Pattern Syntax".

For more information on properties (such as "Uppercase Letter") see the Properties section of the ICU User Guide. Note that properties may be abbreviated. For example, "Uppercase Letter" may be abbreviated as "Lu". Abbreviations are given in the Unicode files PropertyAliases.txt and PropertyValueAliases.txt.

Walkthrough

Select a sample from the Insert Samples menu. You'll see some sample text in different scripts in the Input area. Now press the Transform button. You'll see the same text, transformed, in the Output 1 area, as a result of the first transform (the specific transform depends on which sample is selected). In the Output 2 area, you'll see the text transformed again as a result of the second transform.

Samples

The samples links point to sample text in various scripts on the Unicode website. This text can be pasted into the Input area to see the effect of various transforms on actual text in different scripts.

Display Problems

The demo displays characters using UTF-8. Your browser must be configured properly to see UTF-8 characters. If you have any difficulties, please refer to the display problems page on the Unicode website.

Supported Browsers

This demo has only been tested on Internet Explorer.

Feedback

If you encounter any problems, please file a bug in the ICU bug database. For general questions or comments please send email to one of our mailing lists.