International Calendars in Java
by Laura Werner
IBM Center for Java Technology
Cupertino, California
Over the last few years, many programmers have had a growing awareness of international issues, both in Java and in other languages. The software industry and the economy as a whole are becoming much more global, and there is an increasing need for applications that can function properly in more than one language and country. In addition, many programming toolkits such as the Java Class Libraries, the Win32 and Macintosh API’s, POSIX, etc., have fairly extensive international support built in, which makes writing an internationalized application much easier than it used to be.
While these API’s are all designed differently, at their core they provide a similar set of functionality. There are character converters that transform Unicode to legacy code pages, and vice-versa. There are sorting routines or collator objects that can be used for language-sensitive string comparison. There are facilities for word- and line-break detection in different languages. Finally, there are ways of formatting numbers, dates, times, and currencies for different languages and countries.
The date, time, and number formatters are necessary because most countries
have different conventions for displaying this data. For example, an American
English speaker would write the date 1/1/2000 AD (or 1/1/2000 CE) as "Saturday,
January 1, 2000." A British English speaker might write "Saturday,
1 January 2000" instead. And a French speaker would write "samedi
1 janvier 2000". Any self-respecting international library can handle this
for you. In Java, you’d use the class java.text.DateFormat
to
do the work.
Still, I suspect that even today a lot of programmers would look at the title of this article: "International Calendars", and wonder just what those two words have to do with each other. Dates, sure. But calendars? Though the topics might not seem linked at first, the connection is fairly obvious once you think about it.
Consider what should happen when a Hebrew speaker in Israel is using your program. The same date we discussed above, 1/1/2000 AD, would be displayed as "תבש 23 תבט 5760" or "Saturday 23 Tevet 5760". Not only are the strings different, the numbers are as well.
Though it comes as a surprise to many Americans, the official calendar in Israel is the Hebrew calendar, not the Gregorian one that we use in most of the Western world. The Hebrew calendar, as well as many others such as Hijri (Islamic), Hindu, Buddhist, and Japanese, all number the years differently. Many of them, including the Hebrew, also have a different system for calculating months, which leads to 1/1/2000 AD being "תבש 23", or "Tevet 23", rather than January 1.
All of this means that internationalization of dates and times requires more than just a different table of strings for each language. A program that naïvely assumes that the Gregorian calendar applies everywhere will be hopelessly wrong in countries that use a different calendar rather than just a different language.
In this article, I’ll discuss the Java Class Library facilities that allow you to manipulate and display dates and times. Next, I’ll show how you can extend the Java calendar classes to support calendars that are not built in to the JDK. Finally, I’ll discuss some free classes from IBM that support the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars.
First, let me jump back to JDK 1.0 for a moment. The first release of Java
had relatively poor support for international dates and times, with java.util.Date
and its toString
method the only real tools at your disposal. The situation in the rest of Java
was similar. It had the beginnings of international support, because a Java
char
is stored as a Unicode character. But that's all. You
couldn't enter or display non-Latin characters, and there were no facilities
for language-sensitive formatting, sorting, and so on.
The management of Sun and IBM found a way to fix this problem for JDK 1.1. Java was missing international support. But IBM’s Taligent subsidiary had great international technology, talented engineers -- including Dr. Mark Davis, president of the Unicode Consortium -- and a location about 100 yards away from Sun’s JavaSoft division in Cupertino, California. Thus a partnership was born. IBM arranged for Taligent’s Text and International group to contribute international classes into Sun’s JDK in order to make Java powerful enough for real-world business applications.
Taligent, in collaboration with Sun's internationalization engineers, provided
the new java.text
package, plus a number of new classes in java.util
. This included the date- and time-related classes DateFormat
, SimpleDateFormat
,
Calendar
, GregorianCalendar
,
TimeZone
, and SimpleTimeZone
. I’ll discuss these classes in turn, starting with the
old Date
class.
Date
The java.util.Date
class has been
part of Java since JDK 1.0. Each instance of Date
represents a particular instant in time, stored as a long
number of milliseconds since January 1, 1970 AD, 00:00 GMT. To construct a Date,
you would typically use the constructor:
Date(int year, int month, int date)
or one of its variants that takes additional arguments such as hours, minutes,
seconds, etc. In addition, there is a constructor whose argument is a String
such as "Sat Aug 12 1995 13:30:00 GMT".
In JDK 1.0, and even today, these methods all work as advertised. If you execute the following code:
Date d = new Date(99, 1, 1); String s = d.toString();
the value of s
will be "Mon Feb 1 00:00:00 PDT 1999".
There are a few obvious problems here. The most blatant is that the year argument to the constructor is only two digits, with 1900 assumed to be the origin. This is a huge Y2K problem, because there’s no way to specify a date before 1900 or after 1999. There’s also no obvious way to fix this. Sun can’t change the meaning of the first parameter, because that would break existing code. Adding a Y2K-safe override would be difficult too, since that would require the overload to have different argument types.
The next problem is the toString
method. The JDK 1.0 documentation
stated that the string it returns was always of the form "Sat Aug 12 1995
13:30:00 GMT", with US English day and names and time zone abbreviations.
Again, there was no way to change this in a later release. Once the documentation,
which is effectively the specification for the Java Class Libraries, guarantees
a certain behavior, it can't be changed without the danger of breaking existing
applications.
Finally, take another look at the Date constructor in the code snippet above:
Date d = new Date(99, 1, 1);
Notice that we passed in "1" for the month and day, but the resulting date was February 1st. Since arrays in Java and C are 0-based, and since month numbers are often used as an index into an array of strings, the original designers of Java decided to make the month numbers 0-based. So January is month 0, February is month 1, and so on. Unfortunately most people, even programmers, think of January as the 1st month of the year, not the 0th, so this choice has led to a great deal of confusion.
When it was time to work on JDK 1.1, we were faced with a decision: What should
we do about Date
? Because of the problems I discussed above, IBM
and Sun decided that Date was so broken that it couldn't be fixed and decided
to replace it instead. But since Date was trying to do so many different things
-- date formatting, calendar calculations, and time zones, we decided to replace
it with several different classes.
Date Formatting
The first of these new classes is DateFormat
. As I mentioned above, Date's
String constructor and its toString
method had two problems: the strings were in a fixed format and they were always
in English. DateFormat
solves both of these problems.
The job DateFormat and its concrete subclass SimpleDateFormat is to convert from a Date object to a String and vice-versa, and to do it properly for all of the locales that Java supports. To format a date and time for the current locale, the code is fairly simple:
Date d = new Date(1999, 0, 1); DateFormat f = DateFormat.getDateTimeInstance( DateFormat.FULL, // Date style DateFormat.FULL); // Time style String s = s.format(d);
If you run this on a US English system, the result will be "Friday, January 1, 1999 00:00:00 AM PST".
So far, this just seems like an expensive way of spelling Date.toString.
But you get something for the extra effort: internationalization. If you change
the second line of code to this:
DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, Locale.FRANCE);
the result will appear in French: "venredi 1 janvier 1999 00:00:00 GMT-08:00".
DateFormat also solves the fixed-format problem.
The examples above uses a "FULL
"
date/time formatter. If you want to be a bit more concise, you can use DateFormat.MEDIUM
, which gives the result "Jan 1, 1999 00:00:00 AM"
for English. Similarly, DateFormat.SHORT
gives "1/1/99
00:00 AM."
If you want to see just the date in your output, not the time, the solution
is also simple -- call getDateInstance
instead of getDateTimeInstance
:
Date d = new Date(99, 0, 1); DateFormat f = DateFormat.getDateInstance(DateFormat.FULL); String s = s.format(d);
and the result will be "January 1, 1999."
Now, remember the Date
constructor
that takes a String
. That constructor
had the same problems as toString
, but in reverse: it required a fixed format and it assumed the string would be
in English. DateFormat solves these problems too, because it doesn’t just format
dates; it parses them. For example, consider the following code:
DateFormat f = DateFormat.getDateInstance(DateFormat.FULL, Locale.FRANCE); Date d = f.parse("venredi 1 janvier 1999");
The Date
object, d
,
will end up referring to 1/1/1999. All of the other points I discussed above
apply to parsing as well as to formatting: you can request a particular locale,
choose shorter or longer formats, etc.
Many of the DateFormat
examples I
showed above included time zones in their output. In JDK 1.0, all of Java's
time zone logic was baked into Date.toString
.
It assumed that you always wanted dates displayed using the current default
time zone, and that you wanted the US English abbreviations for the time zone
names. This was fixed in JDK 1.1 as well, with the addition of the new class
java.util.TimeZone
.
TimeZone and its concrete subclass SimpleTimeZone
are relatively low-level classes that encapsulate the relationship between local
clock time and Greenwich Mean Time. You can use them to convert from GMT to
local time and back as well as to determine whether daylight savings time is
in effect. The other classes use TimeZone in their time-related calculations,
and many of them expose the time zone as a property that you can get and set.
Here's a simple example. Say that you want to display the same date we've been using in all of our examples, but that you want to force it to be displayed in GMT, regardless of the time zone you're running in. The code would look like this:
Date d = new Date(99, 0, 1); DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL); f.setTimeZone(TimeZone.getTimeZone("GMT")); String s = s.format(d);
The formatter will now use GMT, so the result will be something like " Friday, January 1, 1999 08:00:00 AM GMT ". Note that the time zone is different, and that the time of day is 08:00, since GMT is 8 hours ahead of PST.
In JDK 1.1, there was one problem with the way that DateFormat used TimeZone. Let's jump back to this example for a moment:
Date d = new Date(99, 0, 1); DateFormat f = DateFormat.getDateTimeInstance(DateFormat.FULL, DateFormat.FULL, Locale.FRANCE); String s = s.format(d);
As I mentioned above, you'd expect the result to be "venredi 1 janvier
1999 00:00:00 GMT-08:00" if you were running your computer in the Pacific
time zone. However, in JDK 1.1.5 and earlier, the result would actually be "venredi
1 janvier 1999 09:00:00 CEST". CEST is "Central European Standard
Time", the time zone used in Paris, which is one hour ahead of GMT. Rather
than using the system's default time zone, DateFormat
was using the first time zone it could find for the locale you requested, in
this case Locale.FRANCE
. This caused no end of confusion,
since it was almost never what programmers expected.
Fortunately, there was a simple workaround for this problem. To force a DateFormat to use the default time zone, you just do this:
DateFormat f = . . . .; f.setTimeZone(TimeZone.getDefault());
In JDK 1.1.6 we fixed this problem, and a newly-created DateFormat object always
uses the default time zone. If you need a different time zone, you can always
call DateFormat.setTimeZone
to request the one you want.
Now that I've given a quick tour of the other date-related classes that were
new to JDK 1.1, I can go on to the meat of this article: java.util.Calendar
. As an introduction, let's revisit a code snippet from
our discussion of Date
:
Date d = new Date(99, 1, 1);
In this example, the constructor arguments are interpreted in only one way:
February 1, 1999 AD, in the Gregorian calendar. There's a similar problem with
Date methods such as getDay
, getMonth
,
getYear
, etc. But as I described in the introduction, some
countries use different calendars: Hebrew, Hijri, or whatever. A fully-internationalized
Java application needs to be able to support multiple calendar systems, not
just the Gregorian one.
Since the Java Class Libraries are object-oriented, the obvious solution to
this problem is to create an abstract class that represents a generic calendar,
with concrete subclasses for specific calendar systems. And that's just what
we did. JDK 1.1. included a new abstract class, java.util.Calendar
,
which provides a generic API for calendar operations. It also included one concrete
subclass, which as you might guess is GregorianCalendar
.
Calendar has a number of abstract methods that parallel the old, deprecated get methods of Date. For example, imagine that you want to find out what year it is. With the old Date API, the code would look like this:
int year = new Date().getYear();
With Calendar, you do this instead:
int year = Calendar.getInstance().get(Calendar.MONTH);
The call to getInstance
creates a
Calendar
that is appropriate for the current locale, and the call
to get
returns the current value of the calendar's MONTH
field. Calendar provides constants for about fifteen fields,
including YEAR
, DAY_OF_MONTH
, DAY_OF_WEEK
,
WEEK_OF_YEAR
, and many others. These constants are all interpreted
in terms of the calendar system that your Calendar object represents, so if
you have a Hebrew calendar object, you'll get the Hebrew month, not the Gregorian
one.
If you're wondering how this works, remember that Calendar is an abstract class.
Each time that get is called, the calendar checks to see if the fields are up
to date. If they are not, it calls the abstract, protected method computeFields
. Each subclass overrides this method to perform the calculations
appropriate for that calendar system. For example, GregorianCalendar
has a computeFields
method that performs
the standard Gregorian calculations.
Calendar.get replaces the deprecated get methods on Date, but what about the constructor? That functionality is provided by constructors on the concrete Calendar subclasses. If you want to construct a Calendar set to January 1, 2000, you write:
Calendar c = new GregorianCalendar(2000, Calendar.JANUARY, 1);
This solves the problem neatly. You know exactly which calendar system will be used to interpret the year, month, and day that you've specified: the calendar that you're instantiating. If someone has input a Hebrew date, and you have a HebrewCalendar class, the code is fairly obvious:
Calendar c = new HebrewCalendar(5760, HebrewCalendar.TEVET, 23);
January is still Zero
You would think that when we deprecated most of Date and added the new Calendar class, we would have fixed Date's biggest annoyance: the fact that January is month 0. We certainly should have, but unfortunately we didn't. We were afraid that programmers would be confused if Date used zero-based months and Calendar used one-based months. And a few programmers probably would have been. But in hindsight, the fact that Calendar is still zero-based has caused an enormous amount of confusion, and it was probably the biggest single mistake in the Java international API's.
When you're using Calendar or any of its subclasses, it's usually best not to use raw numbers in Calendar calls unless you just can't avoid it. Instead of writing code like this:
Calendar c = new GregorianCalendar(2000, 0, 1);
write this instead:
Calendar c = new GregorianCalendar(2000, Calendar.JANUARY, 1);
This takes a bit longer to type, but it's a lot less error-prone.
Add and Roll
One aspect of Calendars that wasn't addressed at all in Date
was calendar manipulation. For example, imagine
that you want to determine what the date will be one month in the future. With
the old API, you had to do an awful lot of work on your own: call getMonth
, getYear
,
and getDate
, add one to the month, see if it wrapped to
a new year, make sure the day of the month is still in bounds (remembering those
leap years!) and so on. Not only is this a hassle, it's not internationalized.
Other calendar systems have a different number of days per month, different
(and possibly variable) months per year, different leap year calculations, and
so on.
Calendar and its subclasses solve this problem for you, with their add
and roll
methods. If you want to add one month to the current date, you only need two lines
of code:
Calendar c = Calendar.getInstance(); c.add(Calendar.MONTH, 1);
As you'd expect, the add method adds the given number to the field that you specify. It knows about all of the rules for the calendar system, so code like this will work properly.
GregorianCalendar c = new GregorianCalendar(1999, Calendar.JULY, 29); c.add(Calendar.MONTH, 7); String s = DateFormat.getInstance().format(c.getTime());
The result of this calculation will be February 29, 2000. GregorianCalendar.add
knows that when the month passes 11 (December) it should roll back to 0. It
also knows that February 29, 2000 is a valid date, using the complicated rules
that it is a leap year because it's divisible by four, except it isn't
because it's divisible by 100, except it is because it's divisible by
400.
Closely related to add
is the roll
method. This method is handy when you want to implement a user interface that
"rolls" from the end of a month back to the beginning of the same month,
or to do the same thing for weeks or years. The usage is almost identical to add:
GregorianCalendar c = new GregorianCalendar(1999, Calendar.JULY, 29); c.roll(Calendar.DAY_OF_MONTH, 6);
The result will be July 4, 1999. If roll just added 7 to 29 it would get 33, which is past the end of July. Add would handle this by continuing on into August, ending up on August 4th. But roll is different. It wraps back to the beginning of July and end up on July 4th.
Locale-Specific Calendar Properties
If you thought we'd now solved all possible calendar internationalization problems, you'd be incorrect. Even within a single calendar system, such as Gregorian, there are a few properties that can differ from one country to the next. As an example, here are the US and French versions of the calendar for July, 1999:
United States |
||||||
Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
1 |
2 |
3 |
||||
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
France |
||||||
lun |
mar |
mer |
jeu |
ven |
sam |
dim |
1 |
2 |
3 |
4 |
|||
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
Notice that in France, the first day of the week is Monday (or lundi), while
in the United States it is Sunday. If you're writing an application that displays
calendars graphically, you need to take this into account. Java provides the
method Calendar.getFirstDayOfWeek
to handle this. When you create a calendar, you can specify the locale you're
interested in:
Calendar c = Calendar.getInstance(Locale.FRANCE);
and then call getFirstDayOfWeek
to find out how to draw it:
int d = c.getFirstDayOfWeek();
A related method is getMinimalDaysInFirstWeek
, which tells you how long a week has to be to qualify as the "first"
week of the month. In the US calendar shown above, is the first week of July
the week that starts on July 5, or the previous one that starts on June 27?
According to Java's locale data it's the latter, because getMinimalDaysInFirstWeek
returns 1.
Creating your own Calendars
All of the international calendar features I've talked about so far are great.
However, there's a catch that limits the amount of calendar internationalization
that you can actually do. Both JDK 1.1 and Java 2 only provide one concrete
subclass of Calendar
: GregorianCalendar
. The traditional
calendars used in other countries are not yet supported.
However, all is not lost. It entirely possible to create your own subclasses
of Calendar
that support different calendar systems. I've written
classes that support the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars,
and I want to share some of that knowledge here.
When you look at the Calendar class, you'll notice that it has 11 abstract
methods: add, after, before, equals, getMinimum
,
getMaximum
, getGreatestMinimum
,
getLeastMaximum
, roll
, computeTime
,
and computeFields
. Implementing your own calendar subclass
requires that you override all of these methods to provide an implementation
that's specific to your calendar system. These methods can be divided into three
basic groups.
The first group, the minimum and maximum functions, are the easiest so these
are usually the ones that I implement first. The first two, getMinimum
and getMaximum
,
tell you the largest allowable range for each field, while getLeastMaximum
and getGreatestMinimum
, tell you the smallest range for the field. For example, the DAY_OF_MONTH field
of GregorianCalendar has a minimum and maximum of 1 and 31, but a greatest minimum
and least maximum of 1 and 29. Implementing this is easy. Since the result is
a constant, you can just store it in a table. The methods almost always ends up
looking like this:
public int getMinimum(int field) { return minMax[field][0]; }
The real heart of a calendar class consists of its computeFields
and computeTime
methods. The first, computeFields
,
calculates the values of all of the fields (year, month, day, etc.) from the
absolute time, which is represented as the number of milliseconds since January
1, 1970. Conversely, computeTime
uses the field values to
calculate the absolute time.
These two methods are usually quite complicated, because they must implement the calendar system's rules very precisely. The details for a real calendar are way beyond the scope of this article, so I've invented a very simple calendar that we can experiment with. It has 360 days per year, divided into 12 months of 30 days each, with no leap years. There are seven days per week, just like our calendar, and the day 1/1/1 in this calendar was a Saturday. Based on this simplification, I can offer a few generalizations.
First, your calculations will usually be based on an "epoch" date on which the calendar started. Usually you'll want this to be the 0th day of your calendar, that is the day before the first day of year 1. You should define a constant that specifies the epoch in milliseconds since 1/1/1970 AD. I'll start our example calendar on the same day the Hebrew calendar started, just because I have the constant handy:
private static final long EPOCH_MILLIS = -180799862400000L;
You'll also need a few constants for the number of milliseconds in a second, minute, hour, etc:
private static final long SECOND_MS = 1000; private static final long MINUTE_MS = 60 * SECOND_MS; private static final long HOUR_MS = 60 * MINUTE_MS; private static final long DAY_MS = 24 * HOUR_MS;
Next, you'll want to get used to modular arithmetic, because you'll be doing a lot of it. Many of the calculations will be based on the number of days since the epoch, so you should calculate that first:
long absDay = (time - EPOCH_MILLIS) / DAY_MS;
Once you have that number, it's easy to calculate the year, month, and day:
int year = (int)(absDay / 360) + 1; int month = (int)((absDay / 30) % 12) + 1; int day = (int)(absDay % 30) + 1;
Of course, the calculations are a lot more complicated for real calendar systems, what with the variable month and year lengths, leap years, and the other complex interdependencies such as the "postponement rules" in the Hebrew calendar. There are a number of good references on this subject, including some web sites you can find with most search engines. But my favorite resource is the book Calendrical Calculations, which is listed in the references at the end of this article.
Real Code
If you'd like to see some real Java classes that implement non-Gregorian calendars,
pay a visit to http://www.alphaWorks.ibm.com/tech/calendars.
The "International Calendars" package you'll find on that page supports
the Buddhist, Hebrew, Hijri, and Japanese Imperial calendars. It includes Java
ResourceBundle
files containing translated strings for these calendars
in a number of different languages, as well as some utility methods for formatting
dates as strings using non-Gregorian calendars.
I hope this article has given you a good feel for the things that you can do with calendars in Java. Though it has its warts, Java's calendar framework is the most powerful one I've seen in any major operating system or application framework.
Acknowledgements
Alan Liu, the IBM engineer responsible for the time and date classes in the JDK, was very helpful while I was writing this paper.
References
Calendrical Calculations, by Nachum Dershowitz and Edward M. Reingold (Cambridge University Press, 1997) has excellent descriptions of calendar algorithms in general as well as detailed algorithms for all of the calendar systems in common use today.
The Java Class Libraries, 2nd Edition, vol. 1, by Chan, Lee, and Kramer (Addison-Wesley, 1998) has a nice description of Calendar and GregorianCalendar.
Making your Java/C++/C Applications Global, at www.ibm.com/java/education/international-unicode/unicode1.html is a good overview of some of the issues involved in writing global applications.
Laura Werner is the is the manager of the Unicode Technology Group at IBM Cupertino. She joined Taligent in 1994 and moved to the Unicode group in 1997, just before Taligent was absorbed into IBM. Laura became the group's manager in 1999 and is now responsible for coordinating the Java and C++ Unicode efforts at IBM Cupertino, helping to architect the Unicode class libraries, and working on occasional side projects such as this paper. Laura holds Bachelor’s degrees in Geological Sciences and Integrated Science from Northwestern University.