ICU Core Meeting
Date: August 24, 2005 @ 10:00am PT

Attendees:
Vladimir Weinstein
George Rhoten (scribe)
Ram Viswanadha
Markus Scherer
Andy Heninger
Eric Mader
Steven Loomis
Tex Texin
Deborah Goldsmith
Mark Davis
Doug Felt

Agenda:
 - ICU 3.4 post mortem
 - ICU 3.6 status
 - Status update on codepage recognition update
 - 64-bit lengths in ICU

Vladimir: We are asking for feedback for ICU 3.4

Deborah: Updating ICU is usually fairly easy now, except for the cvs import command.
	cvs import issues are unrelated to ICU.

Vladimir: Was the gcc 4 compiler bug fixed?

Deborah: gcc bug was fixed in gcc 4.0.1 train
	gcc 3.3 is the workaround to get around this problem.

Eric: Using an older gcc compiler for Fedora Core 4 fixes the problem too.

Tex: We switched to 3.4 from 3.2, and there were no major issues.
	Also the Unicode version of PHP is getting a little more public. It's on php.net.
	There is some talk about whether this will be a minor update to PHP 5 or in PHP 6.

Vladimir: ICU 3.6 status.
	We are planning the 3.6 items. The feature list isn't final yet.

Mark: List will be stated on the icu-core list.
	Good news. Eclipse will be using ICU4J.

Tex: The charset detection is going okay.
	Badi will have it implemented later this week.
	Testing will happen next week.

Andy: I owe an updated charset detection API later today.

Vladimir: Deborah wanted to talk about 64-bit lengths in ICU.

Deborah: We were wondering why size_t wasn't used?

Markus: UText will use int64_t.
	We aren't changing it for backwards compatibility.
	We did have a UTextOffset, but it was used inconsistenly. So we removed it.

Deborah: It would be nice if there was a compiler time option to use size_t and ptrdiff_t.

Eric: This is difficult to implement this change, since it's difficult to test and it would
	require a lot of changes.

Markus: UText is new API, and it's draft. So it's okay to change it to use int64_t.
	We prefer not to introduce new compiler options, since it increases build difficulties.
	If you look at the bool type, it's a variable type, and causes C++ name mangling problems.
	We have no pressure to implement 64-bit sizes.

Mark: We can use UText to get the 64-bit type in the door.

Deborah: I think the UText approach will be a good thing to do.
	The 64-bit requirement is a general directive for Apple.
	We are thinking about break iteration, string search, collation and regex.

Vladimir: There was a Thai work breaking proposal. Can Deborah summarize?

Deborah: I haven't read it completely.
	I do know that many of the issues are implemented in Mac OS X.
	I agree with a lot of the issues with the existing ICU implementation.
	I don't agree with dynamic programming, since it's not good for performance.
	Abstract dictionary would be good.

Andy/Mark/Deborah: Technical discussion about Thai break iteration goes on...


Deborah: It looks like charset detection is moving along nicely.

Andy: Ken asked about the data used for the charset detection.
	We can't redistribute the data.

Mark/Eric: Unfortunately I don't think there is that much data that is in the public domain.

Mark: What if we started a public corpus to hold this kind of data?

Vladimir: That could a CLDR topic.

Deborah: So it looks like there isn't any demand from IBM for flexible date/time formatting.

Mark: Correct, but it doesn't seem like a lot of work.

Tex: MySQL doesn't have good Unicode support. Is there any resources for this kind of testing.

Mark: There is test data for this in IBM, but it's not public.
	Tex should talk to IBM as a customer or partner to see if they can get access to it.


meeting ends at 11:05 PT