ICU Development Team Meeting
Time: 10:30am PDT, 1pm EDT, 19:30 CET
Date: 2002-Feb-06

Attendees:
    Markus Scherer (IBM)
    Vladimir Weinstein (IBM)
    Syn Wee Quek (IBM)
    Andy Heninger (IBM)
    George Rhoten (IBM)
    Ram Viswanadha (IBM)
    Steven R. Loomis (IBM)
    Daniel Chen (Lotus/IBM)
    Hyangmi Cho (Lotus/IBM)
    Tex Texin (Progress Software)
    Yves Arrouye (RealNames)
    Mark Davis (IBM)
    Helena Shih Chapman (IBM)

Minutes taker: Steven R. Loomis

Agenda Item
 - Data Loading  
   + Andy: Summary of goals.
    . Simplify loading
    . Allow installation by dropping in files 
    . Clean support for user data
    . Clean up fallbacks to remove possible conflicts

   + New design:
    . First look for a single file, then a common (packaged) file.
    . Overrides are now possible, in a separate file which will replace what's in common.
    . Package name always attached to individual filenames, ex icudt20b_fr_BE.res  and application would be: mydata_fr_BE.res

   + Tex thought this seems to resolve issues, there should be a documented migration path for existing data. It is hard to know what the dependencies are between different ICU data items, and therefore hard for people to remove data they don't want without hurting performance or breaking ICU completely.
   + Daniel: Lotus would like to partition collation data from calendar data. On the Mac, this uses up lots of memory.  ( It was mentioned that removing Calendars would probably not save much space.) 

   + A discussion about partitioning (having multiple ".dat" files) ensued.
    . Could store collation binary data in a different tree
    . Levels of partitioning are: Source, Installed files, and Representation in Memory
    . Tex: Might design some partitioning at build time, and then deploy a certain partitioning to a customer, and wants them to be able to download a file and add it without a build, maybe with minimal repackaging. ( A: You can also just replace a small 'icudata' file with a larger one. )
    . Could partition by region (Tier, country, ..), by platform, (Mac converters, ..)
    . Tex: would like packaging to be as easy as ‘zip them together and it works'.
      - Yves: decmn and gencmn will do this - but you don't know that fr_BE depends on fr.
    . Mark: have a search path with a bunch of files, then we have common files which contain a bunch of files.
       - Tex: Could cache the list of files to disk.
    . Markus asked if this is still a problem if the data set size goes down to 4MB.
       - Tex: Have a minimal set of requirements for most users, followed by a set of add-ons.
       - Perhaps a set without East Asian, and you drop in more when you need to.
       - Tex: want to treat everyone even handedly.
    . Could remove EBCDIC on non EBCDIC platforms
    . Yves: could have a search path and let users devise their own partisioning.   Have .dat files in the search path ala ".jar" files... read any .dat files on the search path, read their TOC. Tex could ship an ICU with a large search path and he’d have to put that in his base package.
    ..
    . Can put the package name (icudt20b) into the header of the common file
    . Yves:  An application will need to know the version of ICU’s data - Would be an argument to genrb,  --prefix=mydata,  --prefix=ICUVERSION (and have genrb expand that string)

 - Daniel: Test mapping
   + latin uppercase I and lowercase i is not equal  when ignore case is set.

 - Daniel: Greek Casing
   + Greek has small alpha with accent, greek customer is asking for the uppercase to be without accent.
   + Unicode standard says to do it how ICU does. Should people be given the option?
   + Can fix this by building a small table in his code.

 - Meet again in two weeks.