"Java Liaison" column
October 1998
Richard Gillam

The Amorphous Java Program

If you've only programmed in traditional static programming languages like C++, you may find the experience of coding in Java rather disconcerting at first. In most languages, the end result of the build process is a single executable file. This isn't true in Java. In fact, the whole concept of "Java program" can be somewhat amorphous. This month, we'll take a look at the build process and the overall structure of a Java program.

Consider the following program, which reads a text file from standard input and collects some statistics on it:

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.io.IOException;

public class GenerateStats {
    public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(
						new InputStreamReader(System.in));

        Counter[] counters = new Counter[] {
                                new WordCounter(),
                                new LineCounter() };

        processFile(in, counters);
    }

    static void processFile(BufferedReader in, Counter[] counters)
						throws IOException {
        String line = in.readLine();
        while (line != null) {
            for (int i = 0; i < counters.length; i++)
                counters[i].processLine(line);
            line = in.readLine();
        }
        for (int i = 0; i < counters.length; i++)
            counters[i].dumpData(System.out);
    }
}

interface Counter {
    public void processLine(String line);
    public void dumpData(PrintStream out);
}

class WordCounter implements Counter {
    private int count = 0;
    private boolean lastCharWasSpace = true;

    public WordCounter() {}
    public void processLine(String line) {
        for (int i = 0; i < line.length(); i++) {
            char c = line.charAt(i);
            if (Character.isWhitespace(c))
                lastCharWasSpace = true;
            else {
                if (lastCharWasSpace)
                    ++count;
                lastCharWasSpace = false;
            }
        }
    }

    public void dumpData(PrintStream out) {
        out.println("Word count = " + count);
    }
}

class LineCounter implements Counter {
    private int count = 0;

    public LineCounter() {}
    public void processLine(String line) {
        ++count;
    }

    public void dumpData(PrintStream out) {
        out.println("Line count = " + count);
    }
}


There are many interesting things we could notice in this example, and we'll eventually explore all of them. For now, take note of the following things:

A quick digression on that first point: Some people have complained that Java forces a particular programming paradigm on you, and this is certainly true. In fact, it's true of all programming languages. In C++, the paradigm is so broad that you have to choose a sub-paradigm you're going to use (a "dialect" or "idiom"), but C++ still forces a particular view of the world on you. Java's programming paradigm is simply more restrictive. What isn't true is that Java forces object-oriented programming on you. Look at CODEGenerateStats class: all it contains are two static methods. The only reason this is an object is because you can't declare functions in the global name space. GenerateStats is a scoping vehicle, not a real object. There's nothing to prevent you from writing whole programs this way (although I don't know why anyone would want to).

What executable?

Back to the structure of a Java program. For now, let's say that all the code above is in a single source file CODEGenerateStats.java. If you're using the Sun Java Developer's Kit, as opposed to a third-party programming environment, you would compile this by typing the following at the command line:

javac GenerateStats.java

This would produce the following four files, corresponding to the four class definitions in the original source file:

GenerateStats.class
Counter.class
WordCounter.class
LineCounter.class

That's it. These four files are the program's "executable." You would run the program by typing

java GenerateStats

Each .class file contains a single compiled class (hence the name) in Java byte code, a platform-independent object file format similar in concept to UCSD Pascal p-code. The java program is an interpreter for executing Java byte code-it's usually referred to as the Java virtual machine, or JVM for short.

When you launch the JVM, you specify on the command line the name of the .class file you want to execute. The JVM then starts execution by calling that class's main() function. If it can't find the appropriate .class file, or the class doesn't have a public function with the appropriate signature, it generates an error; otherwise, the program executes.

Dynamic linking

Let's leave aside the issue of Java's being an interpreted language for the time being-we'll explore that in the next column. Instead, let's focus on what's going on in the above example. As we observed, compiling a Java program produces a collection of .class files; there is no link stage. The link stage happens at run time. The first time a class refers to another class, the JVM goes out and locates an appropriate .class file and performs the link on the fly.

There are several wonderful things about this approach:

The downside of this, of course, is that it takes extra time at run time to load and link classes as they're needed, and that problems that can be caught at compile time in C++ often can't be caught until run time in Java. Since the program is now distributed across a bunch of files, there's also the possibility of losing files along the way.

Packages and class paths

Obviously, you could also have a problem if you have no idea where to look for a specific .class file. Java deals with this problem by having conventions as to where .class files are placed.

Look at the example again. Notice the import statements at the top of the file. Classes act as the vehicle of namespace management for functions and variables. But since the whole universe could potentially be linked in at run time, you also need a way to manage the name space that classes are in. Java defines something called a package for this.

A package is simply a name space that contains classes. A class identifies itself as part of a package by having a package statement at the top of its source file.(there's a default package that classes that don't declare a package, such as those in our example, are placed in). The package name is prepended to the class name: if you have two classes CODEFoo defined in packages called bar and baz, you would refer to them as bar.Foo and baz.Foo.

Visibility of methods and variables is controlled at the class level, and visibility of classes is controlled at the package level. Only classes declared public are visible outside of their package-those whose declarations have no qualifier are internal to the package.

Package names are hierarchical. In the example above, we see a class named java.io.BufferedReader. It's in a package called java.io. String is in a package called java.lang. The packages themselves are not hierarchical, however-there is no special relationship (at the source code level) between java.io and java.lang just because their names both start with java.

The import statement, by the way, simply defines a shorthand: "import java.io.BufferedReader;" just allows us to say "BufferedReader" instead of "java.io.BufferedReader" whenever we refer to it in the rest of this source file. (We didn't have to do this for String or Character because they're in the java.lang package, which is imported by default.) If you leave out the imports and instead just say "java.io.BufferedReader" everywhere the example says "BufferedReader", everything still works: the import statement is not analogous to #include in C++.

This doesn't mean, however, that you have to wait until run time to see whether a class file actually exists. This is also checked at compile time-in fact, the javac program will compile the appropriate files if there is a .java file, but not an up-to-date .class file, for a desired class. You only get into trouble at run time if the class file hierarchy that existed at compile time is disturbed (or not replicated appropriately on the user's machine).

The hierarchical nature of package names doesn't matter inside your source code, but it does matter to the runtime environment. It's used to define the location of the class file. Each period-delimited segment of the package name is treated as a directory name. Thus, you would find java.io.BufferedReader by looking in the root level of the search for a directory called java, looking in the java directory for a directory called io, and looking in the io directory for a file called BufferedReader.class.

The root level of the search for a class is known as the class path, and the user specifies the class path either through a command-line argument or an environment variable. (There's a default class path that the runtime will use if you don't specify either.) The class path can include more than one directory, with each being searched in turn in the order they're listed. This give you a way to replace a class in the Java runtime environment with one of your own: just place it in the appropriate place in a directory that you list earlier in the class path than the Java runtime is listed.

Bundling class files together

The problem of losing class files is not a trivial one-the bulk of the Java runtime environment itself is written in Java and stored in .class files. In version 1.1 of Java, Sun introduced something called the Java archive file, or "JAR" file for short, which solves this problem. A JAR file is basically a .zip file with some extra information added. You can create a JAR file using the jar utility in the JDK. The JVM can pull individual class files out of a JAR or .zip file without decompressing the whole archive. If either type of file exists in your class path, it is searched just as if it were a directory. In fact, the Java runtime environment itself is usually packaged together in a single file called classes.zip. If you don't specify another class path, the JVM will look for classes in classes.zip or in the current directory.

Multiple entry points

One last thing: since programs are linked at run time, how do you know where the program's main entry point is? Well, the Java language requires that execution of a Java application program begin by calling a public static function called main() that returns void and takes an array of Strings as a parameter. But it doesn't place any restrictions as to where that function lives. You have to tell the JVM when you launch it which class contains the main()function you want to execute. This means that you can have a set of interrelated classes that each have a main()function. These functions could each use the same set of classes to do different things. This is why the concept of "Java program" can be amorphous: Is a set of classes like this a single "program" with multiple entry points, or multiple "programs" that share a lot of code?

Next time, we'll take a lower-level look at the Java runtime model, and then after that we'll start comparing and contrasting individual features. Hope to see you here again in two months.