ICU Core Meeting Notes
Jan 3, 2007
at UnicodeSet and CollectionUtilities, make a proposal
around with different options for cnv binary file and report back to
the group with recommendation.
||Update the SVN guide with instructions for SVN branch
||Look at all versions of ICU that are affected
by IDNA buffer overflow bug and scope out the work
||Work on the SVN file corruption problem and report back
to the team
||Test the CNV data structure changes with ICU4J and make
sure the file is backwards co
Buffer Overflow Bug in IDNA
The team looked at the IDNA buffer overflow bug and asked Ram to work
on resolving the bug.
- IDNA algorithm should limit the domain label length to
63 on output
- IDNA algorithm should conform to the spec and not allow the
domain label with length larger than 63. In the, we may add options for
allowing domain labels larger than 63.
Conversion Performance and Data File
Markus summarized the conversion data file changes and performance
Key Points (copied from email from Markus):
- While the main innovation is avoiding the UTF-16 pivot when
converting from UTF-8 to something else, some modifications to the .cnv
fromUnicode data structures (which are backward compatible) were
needed. These changes also allowed some fast paths and significant
performance improvements for conversion from UTF-16 to some charsets
(via ucnv_fromUnicode()). There is no API change.
- 56-100% speed up for single byte code pages
- Porting to ICU4J not necessary since the data format is
- For UTF-16->MBCS, I (Markus) took
advantage of the modified .cnv data structure, which accounts for about
two thirds of the improvement, and an ASCII fast path, which accounts
for the rest.
- For UTF-16->SBCS, I (Markus) only used an ASCII fast
path for these measurements. I (Markus) only saw lower performance when
trying to use the modified data structure for the code points where it
is available. Note that the ASCII fast path makes conversion slower
when the proportion of ASCII is low. (The BMP-to-SBCS implementation
already had a very tight loop.)
- The modified .cnv data files are about 6.5% larger than
before, over all of ICU's default .cnv files.