"Java Liaison" column
October 1998
Richard Gillam
The Amorphous Java Program
If you've only programmed in traditional static programming languages like C++, you may find the experience of coding in Java rather disconcerting at first. In most languages, the end result of the build process is a single executable file. This isn't true in Java. In fact, the whole concept of "Java program" can be somewhat amorphous. This month, we'll take a look at the build process and the overall structure of a Java program.
Consider the following program, which reads a text file from standard input and collects some statistics on it:
import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.PrintStream; import java.io.IOException; public class GenerateStats { public static void main(String[] args) throws IOException { BufferedReader in = new BufferedReader( new InputStreamReader(System.in)); Counter[] counters = new Counter[] { new WordCounter(), new LineCounter() }; processFile(in, counters); } static void processFile(BufferedReader in, Counter[] counters) throws IOException { String line = in.readLine(); while (line != null) { for (int i = 0; i < counters.length; i++) counters[i].processLine(line); line = in.readLine(); } for (int i = 0; i < counters.length; i++) counters[i].dumpData(System.out); } } interface Counter { public void processLine(String line); public void dumpData(PrintStream out); } class WordCounter implements Counter { private int count = 0; private boolean lastCharWasSpace = true; public WordCounter() {} public void processLine(String line) { for (int i = 0; i < line.length(); i++) { char c = line.charAt(i); if (Character.isWhitespace(c)) lastCharWasSpace = true; else { if (lastCharWasSpace) ++count; lastCharWasSpace = false; } } } public void dumpData(PrintStream out) { out.println("Word count = " + count); } } class LineCounter implements Counter { private int count = 0; public LineCounter() {} public void processLine(String line) { ++count; } public void dumpData(PrintStream out) { out.println("Line count = " + count); } }
There are many interesting things we could notice in this example, and we'll eventually explore all of them. For now, take note of the following things:
import
statements. A quick digression on that first point: Some people have complained that Java forces a
particular programming paradigm on you, and this is certainly true. In fact, it's true of
all programming languages. In C++, the paradigm is so broad that you have to choose a
sub-paradigm you're going to use (a "dialect" or "idiom"), but C++
still forces a particular view of the world on you. Java's programming paradigm is simply
more restrictive. What isn't true is that Java forces object-oriented programming
on you. Look at CODEGenerateStats class: all it contains are two static methods. The only
reason this is an object is because you can't declare functions in the global name space. GenerateStats
is a scoping vehicle, not a real object. There's nothing to prevent you from writing whole
programs this way (although I don't know why anyone would want to).
What executable?
Back to the structure of a Java program. For now, let's say that all the code above is in a single source file CODEGenerateStats.java. If you're using the Sun Java Developer's Kit, as opposed to a third-party programming environment, you would compile this by typing the following at the command line:
javac GenerateStats.java
This would produce the following four files, corresponding to the four class definitions in the original source file:
GenerateStats.class
Counter.class
WordCounter.class
LineCounter.class
That's it. These four files are the program's "executable." You would run the program by typing
java GenerateStats
Each .class
file contains a single compiled class (hence the name) in Java
byte code, a platform-independent object file format similar in concept to UCSD Pascal
p-code. The java
program is an interpreter for executing Java byte code-it's
usually referred to as the Java virtual machine, or JVM for short.
When you launch the JVM, you specify on the command line the name of the .class
file you want to execute. The JVM then starts execution by calling that class's main()
function. If it can't find the appropriate .class
file, or the class doesn't
have a public function with the appropriate signature, it generates an error; otherwise,
the program executes.
Dynamic linking
Let's leave aside the issue of Java's being an interpreted language for the time
being-we'll explore that in the next column. Instead, let's focus on what's going on in
the above example. As we observed, compiling a Java program produces a collection of .class
files; there is no link stage. The link stage happens at run time. The first time a class
refers to another class, the JVM goes out and locates an appropriate .class
file and performs the link on the fly.
There are several wonderful things about this approach:
The downside of this, of course, is that it takes extra time at run time to load and link classes as they're needed, and that problems that can be caught at compile time in C++ often can't be caught until run time in Java. Since the program is now distributed across a bunch of files, there's also the possibility of losing files along the way.
Packages and class paths
Obviously, you could also have a problem if you have no idea where to look for a
specific .class
file. Java deals with this problem by having conventions as
to where .class
files are placed.
Look at the example again. Notice the import
statements at the top of the
file. Classes act as the vehicle of namespace management for functions and variables. But
since the whole universe could potentially be linked in at run time, you also need a way
to manage the name space that classes are in. Java defines something called a package
for this.
A package is simply a name space that contains classes. A class identifies itself as
part of a package by having a package
statement at the top of its source
file.(there's a default package that classes that don't declare a package, such as those
in our example, are placed in). The package name is prepended to the class name: if you
have two classes CODEFoo defined in packages called bar
and baz
,
you would refer to them as bar.Foo
and baz.Foo
.
Visibility of methods and variables is controlled at the class level, and visibility of
classes is controlled at the package level. Only classes declared public
are
visible outside of their package-those whose declarations have no qualifier are internal
to the package.
Package names are hierarchical. In the example above, we see a class named java.io.BufferedReader
.
It's in a package called java.io
. String
is in a package called java.lang
.
The packages themselves are not hierarchical, however-there is no special relationship (at
the source code level) between java.io
and java.lang
just
because their names both start with java
.
The import
statement, by the way, simply defines a shorthand: "import
java.io.BufferedReader;
" just allows us to say "BufferedReader
"
instead of "java.io.BufferedReader
" whenever we refer to it in the
rest of this source file. (We didn't have to do this for String
or Character
because they're in the java.lang
package, which is imported by default.) If
you leave out the import
s and instead just say "java.io.BufferedReader
"
everywhere the example says "BufferedReader
", everything still
works: the import
statement is not analogous to #include
in C++.
This doesn't mean, however, that you have to wait until run time to see whether a class
file actually exists. This is also checked at compile time-in fact, the javac
program will compile the appropriate files if there is a .java
file, but not
an up-to-date .class
file, for a desired class. You only get into trouble at
run time if the class file hierarchy that existed at compile time is disturbed (or not
replicated appropriately on the user's machine).
The hierarchical nature of package names doesn't matter inside your source code, but it
does matter to the runtime environment. It's used to define the location of the class
file. Each period-delimited segment of the package name is treated as a directory name.
Thus, you would find java.io.BufferedReader
by looking in the root level of
the search for a directory called java
, looking in the java
directory for a directory called io
, and looking in the io
directory for a file called BufferedReader.class
.
The root level of the search for a class is known as the class path, and the user specifies the class path either through a command-line argument or an environment variable. (There's a default class path that the runtime will use if you don't specify either.) The class path can include more than one directory, with each being searched in turn in the order they're listed. This give you a way to replace a class in the Java runtime environment with one of your own: just place it in the appropriate place in a directory that you list earlier in the class path than the Java runtime is listed.
Bundling class files together
The problem of losing class files is not a trivial one-the bulk of the Java runtime
environment itself is written in Java and stored in .class
files. In version
1.1 of Java, Sun introduced something called the Java archive file, or
"JAR" file for short, which solves this problem. A JAR file is basically a .zip
file with some extra information added. You can create a JAR file using the jar
utility in the JDK. The JVM can pull individual class files out of a JAR or .zip
file without decompressing the whole archive. If either type of file exists in your class
path, it is searched just as if it were a directory. In fact, the Java runtime environment
itself is usually packaged together in a single file called classes.zip
. If
you don't specify another class path, the JVM will look for classes in classes.zip
or in the current directory.
Multiple entry points
One last thing: since programs are linked at run time, how do you know where the
program's main entry point is? Well, the Java language requires that execution of a Java
application program begin by calling a public static function called main()
that returns void
and takes an array of String
s as a parameter.
But it doesn't place any restrictions as to where that function lives. You have to tell
the JVM when you launch it which class contains the main()
function you want
to execute. This means that you can have a set of interrelated classes that each have a main()
function.
These functions could each use the same set of classes to do different things. This is why
the concept of "Java program" can be amorphous: Is a set of classes like this a
single "program" with multiple entry points, or multiple "programs"
that share a lot of code?
Next time, we'll take a lower-level look at the Java runtime model, and then after that we'll start comparing and contrasting individual features. Hope to see you here again in two months.