BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Getting Started with HotSpot and OpenJDK

Getting Started with HotSpot and OpenJDK

Lire ce contenu en français

In this article, we’ll be looking at how to get started working on the  HotSpot Java Virtual Machine, and its implementation in the OpenJDK open-source project - both from a virtual machine (VM) perspective and also in terms of its interaction with the Java class libraries.

Introduction to the HotSpot source

Let’s take a look at the JDK source and the implementations of Java concepts contained within. There are two main ways to examine the source code:

  • Modern IDEs can attach src.zip (from $JAVA_HOME), allowing access from the IDE, or
  • use the OpenJDK source code and navigate the file-system.

Both approaches are useful, but it’s important to be comfortable with the second as well as the first. The OpenJDK source code is stored in Mercurial (a Distributed Version Control System similar to the ubiquitous Git version control system.) If you aren’t familiar with Mercurial, there’s a free book called “Version Control By Example” which covers the basics.

To check out the OpenJDK 7 source code, install the Mercurial command line tool, and then:

hg clone http://hg.openjdk.java.net/jdk7/jdk7 jdk7_tl

This will produce a local copy of the OpenJDK repository. This repo has the basic layout of the project, but does not yet contain all the files - as the OpenJDK project is spread across several sub-repositories.

After the initial clone, the local repo will look like this:

ariel-2:jdk7_tl boxcat$ ls -l
total 664
-rw-r--r--  1 boxcat staff   1503 14 May 12:54 ASSEMBLY_EXCEPTION
-rw-r--r--  1 boxcat staff  19263 14 May 12:54 LICENSE
-rw-r--r--  1 boxcat staff  16341 14 May 12:54 Makefile
-rw-r--r--  1 boxcat staff   1808 14 May 12:54 README
-rw-r--r--  1 boxcat staff 110836 14 May 12:54 README-builds.html
-rw-r--r--  1 boxcat staff 172135 14 May 12:54 THIRD_PARTY_README
drwxr-xr-x 12 boxcat staff    408 14 May 12:54 corba
-rwxr-xr-x  1 boxcat staff   1367 14 May 12:54 get_source.sh
drwxr-xr-x 14 boxcat staff    476 14 May 12:55 hotspot
drwxr-xr-x 19 boxcat staff    646 14 May 12:54 jaxp
drwxr-xr-x 19 boxcat staff    646 14 May 12:55 jaxws
drwxr-xr-x 13 boxcat staff    442 16 May 16:01 jdk
drwxr-xr-x 13 boxcat staff    442 14 May 12:55 langtools
drwxr-xr-x 18 boxcat staff    612 14 May 12:54 make
drwxr-xr-x  3 boxcat staff    102 14 May 12:54 test

Next, you should run the get_source.sh script, which was pulled down as part of the initial clone. This will populate the rest of the project, and clone all of the files needed to actually build OpenJDK.

Before we delve into a full discussion of the source code it’s important to say: “Don’t be afraid of the platform source code”. Developers quite often think that the JDK source must be awe inspiring and unapproachable; after all it’s the core of the platform.

The JDK source is solid, well-reviewed and well-tested; but consummately approachable. In particular, the source has not always had up-to-date Java language features applied throughout. So it is fairly common to find classes in the internals that are for example still not generified, and which use raw types throughout.

There are several major repositories for the JDK source code that you should be familiar with:

jdk

This is where the class libraries live. These are mostly Java (with some C code for native methods). This is a great starting point for getting into the OpenJDK source code. The classes of the JDK are in jdk/src/share/classes

hotspot

The HotSpot VM - this is C/C++ and assembly code (with some Java-based VM dev tools). It is quite advanced, and can be a bit daunting to start with if you’re not a hardcore C/C++ developer. We’ll discuss some good paths into it in more detail later on.

langtools

For people interested in compilers and tool development, this is where the language and platform tools can be found. Mostly Java and C code - not as easy to get into as the jdk code, but should be accessible to most developers.

There are also other repos that are potentially less important or interesting to most developers, covering things like corba, jaxp and jaxws.

Building OpenJDK

Oracle recently kicked off a project to do a complete overhaul of the OpenJDK and simplified the build infrastructure. This project, known as “build-dev”, is now complete and is the standard way for building OpenJDK. For many users on Unix-based systems, a build is now as simple as installing a compiler and a “bootstrap JDK”, and then running the three commands:

./configure
make clean
make images

For more detail on building your own OpenJDK and getting started with hacking it, the AdoptOpenJDK programme (founded by the London Java Community) is a great place to start - it’s a community of almost 100 grassroots developers working on such projects as warnings cleanup, small bug fixes and compatibility testing of OpenJDK 8 with major open-source projects.

Understanding the HotSpot runtime environment

The Java runtime environment as provided by OpenJDK consists of the HotSpot JVM combined with class libraries (which are largely bundled up into rt.jar).

As Java is a portable environment, anything that requires a call into the operating system is ultimately handled by a native method. In addition, some methods require special support from the JVM (e.g. classloading). These too are handed off to the JVM via a native call.

For example, let’s look at the C source for the native methods of the primordial Object class. The native source for Object is contained in jdk/src/share/native/java/lang/Object.c and it has six methods.

The Java Native Interface (JNI) usually requires the C implementations of native methods to be named in a very specific way. For example, the native method Object::getClass() uses the usual naming convention, so the C implementation is contained in a C function with this signature:

Java_java_lang_Object_getClass(JNIEnv *env, jobject this)

JNI has another way of loading native methods, which is used by the remaining five native methods of java.lang.Object:

static JNINativeMethod methods[] = {
     {"hashCode",     "()I",                  (void *)&JVM_IHashCode},
     {"wait",         "(J)V",                 (void *)&JVM_MonitorWait},
{"notify", "()V", (void *)&JVM_MonitorNotify},
{"notifyAll", "()V", (void *)&JVM_MonitorNotifyAll},
{"clone", "()Ljava/lang/Object;", (void *)&JVM_Clone},
};

These five methods are mapped to JVM entry points (which are designated by the JVM_ prefix on the C method name) - using the registerNatives() mechanism (which allows the developer to change the mapping of Java native methods to C function names).

The general picture is that as far as possible, the Java runtime environment is written in Java, with only a few small places where the JVM needs to become involved. The JVM’s main job, apart from execution of code, is the housekeeping and maintenance of an environment where the runtime representations of live Java objects live - the Java heap.

OOPs & KlassOOPs

Any Java object in the heap is represented by an Ordinary Object Pointer (OOP). An OOP is a genuine pointer in the C / C++ sense - a machine word which points to a memory location inside the Java heap. The Java heap is allocated as a single continuous address range in terms of the JVM process’s virtual address space, and then memory is managed purely from within user space by the JVM process itself, unless the JVM needs to resize the heap for any reason.

This means that creation and collection of Java objects does not normally involve system calls to allocate or deallocate memory.

An OOP consists of two machine words of header, which are called the Mark and the Klass words followed by the member fields of this instance. An array has an extra word of header before the fields - the array’s length.

We will have a lot more to say later about the Mark and Klass words, but their names are deliberately suggestive - the Mark word is used in garbage collection (in the mark part of mark-and-sweep) and the Klass word is used as a pointer to class metadata.

The fields of the instance are laid out in a very specific order in the bytes following the OOP header. For the precise details read Nitsan Wakart’s excellent blog post "Know thy Java Object Memory Layout".

Both primitives and reference fields are laid out after the OOP header - and object references are, of course, also OOPs. Let’s look at an example, the Entry class (as used in java.util.HashMap)

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
final int hash;
// methods...
}

Now, let’s calculate the size of an Entry object (on a 32-bit JVM).

The header comprises a Mark word and a Klass word, so the OOP header is 8 bytes on 32-bit (and 16 bytes on 64-bit HotSpot).

Using the definition of an OOP, the overall size is 2 machine words + the size of all instance fields

Fields of reference type show up as pointers - which will be a machine word size on any sane processor architecture.

So, as we have an int field, two reference fields (referring to objects of type K and V) and an Entry field, giving a total size of 2 words (header) + 1 word (int) + 3 words (pointers)

This is 24 bytes (6 words) total to store a single HashMap.Entry object.

KlassOOPs

The klass word of the header is one of the most important parts of an OOP. It is a pointer to the metadata (which is represented as a C++ type called a klassOop) for this class. Of particular importance amongst this metadata are the methods of this class, which are expressed as a C++ virtual method table (a “vtable”).

We don’t want every instance to carry around all of the details of their methods - that would be very inefficient - so the use of a vtable on the klassOop is a good way of sharing that information among instances.

It’s also important to note that the klassOops are different from the Class objects that are the result of class-loading operations. The difference between the two can be summarized as:

  • Class objects (e.g. String.class) are just regular Java objects - they’re represented as OOPs like any other Java objects (instanceOops) and have the same behaviour as any other objects, and they can be put into Java variables.
  • klassOops are the JVMs representation of class metadata - they carry the methods of the class in a vtable structure. It is not possible to obtain a reference to a klassOop directly from Java code - and they live in the Permgen area of the heap.

The easy way to remember this distinction is to consider a klassOop as the JVM-level “mirror” of the Class object for the relevant class.

Virtual Dispatch

The vtable structure of klassOops is directly relevant to Java’s method dispatch and single inheritance. Remember that Java’s instance method dispatch is virtual by default (so methods are looked up using the runtime type information of the instance object being called).

This is implemented in klassOop vtables by the use of “constant vtable offset”. This means that an overriding method is at the same offset in the vtable as the implementation in the parent (or grandparent, etc) that is being overridden.

Virtual dispatch is then easily implemented by simply walking up the inheritance hierarchy (tracking from class to superclass to super-superclass) and looking for an implementation of the method, always at the same vtable offset.

For example, this means that the toString() method is always at the same vtable offset for every class. This vtable structure helps single inheritance, and also allows some very powerful optimizations when JIT-compiling code.

(Click on the image to enlarge it)

MarkOOPs

The Mark word of the OOP header is a pointer to a structure (really just a collection of bit-fields which holds housekeeping information about the OOP.

Under normal circumstances on a 32-bit JVM, the bitfields of the mark structure look like this (see hotspot/src/share/vm/oops/markOop.hpp for more details):

hash:25 —>| age:4 biased_lock:1 lock:2

The high 25 bits comprise the hashCode() value of the object, followed by 4 bits for the age of the object (in terms of number of garbage collections the object has survived). The remaining 3 bits are used to indicate the synchronization lock status of the object.

Java 5 introduced a new approach to object synchronization, called Biased Locking (and it was made the default in Java 6). The idea is based around the observed runtime behaviour of objects - in many cases objects are only ever locked by one thread.

In biased locking an object is “biased” towards the first thread that locks it - and this thread then achieves much better locking performance. The thread which has acquired the bias is recorded in the mark header:

JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2

If another thread attempts to lock the object, then the biasing is revoked (and it will not be reacquired) - and from then on all threads must explicitly lock and unlock the object.

The possible states for the object are then:

  • Unlocked
  • Biased
  • Lightweight Locked
  • Heavyweight Locked
  • Marked (only possible during Garbage Collection)

OOPs in HotSpot Source

There is quite a complex hierarchy of related OOP types in the HotSpot source. Those types are kept in: hotspot/src/share/vm/oops and include:

  • oop (abstract base)
  • instanceOop (instance objects)
  • methodOop (representations of methods)
  • arrayOop (array abstract base)
  • symbolOop (internal symbol / string class)
  • klassOop
  • markOop

There are some slightly strange historical accidents - the contents of the virtual dispatch tables (vtables) are kept separate from the klassOops, and the markOop looks nothing like the other oops, yet is still contained in the same hierarchy.

One interesting place where the OOPs can be directly seen is from the jmap command-line tool. This gives a quick snapshot of the contents of the heap including any oops present in permgen (which include the subclasses and supporting structures required for klassOops).

$ jmap -histo 150 | head -18
num     #instances        #bytes  class name
----------------------------------------------
  1:        10555      21650048   [I
  2:       272357       6536568   java.lang.Double
  3:        25163       5670768   [Ljava.lang.Object;
  4:       229099       5498376   com.jclarity.censum.dataset.CensumXYDataItem
  5:        39021       5470944   <constMethodKlass>
  6:        39021       5319320   <methodKlass>
  7:         8269       4031248   [B
  8:         3161       3855136   <constantPoolKlass>
  9:       119759       2874216   org.jfree.data.xy.XYDataItem
 10:         3161       2773120   <instanceKlassKlass>
 11:         2894       2451648   <constantPoolCacheKlass>
 12:        34012       2271576   [C
 13:        87065       2089560   java.lang.Long
 14:        20897       2006112   [Lcom.jclarity.censum.CollectionType;
 15:        33798       1081536   java.util.HashMap$Entry

The entries in angle brackets are oops of various kinds, whilst the entries such as [I and [B refer to arrays of ints and bytes, respectively.

The HotSpot Interpreter

HotSpot is a more advanced interpreter than the simplistic “switch in a while loop” style of interpreters that are usually more familiar to developers.

Instead, HotSpot is a template interpreter. This means that a dynamic dispatch table of optimised machine code is constructed - which is specific to the operating system and CPU in use. The majority of bytecode instructions are implemented as assembler language code, with only the more complex instructions such as looking up an entry from a classfile’s constant pool being delegated back to the VM.

This improves HotSpot’s interpreter performance at a cost of making it more difficult to port the VM to new architectures and operating systems. It also makes the interpreter harder to understand for new developers.

In terms of getting started, it’s often better for developers to gain a basic understanding of the runtime environment provided by OpenJDK:

  • Most of the environment written in Java
  • Operating system portability achieved with native methods
  • Java objects represented in the heap as OOPs
  • Class metadata expressed in the JVM as KlassOOPs
  • An advanced template interpreter for high performance even in interpreted mode

From there, developers can start to explore the Java code in the jdk repository, or seek to build up their C / C++ and assembler knowledge to delve deeper into HotSpot itself.

About the Author

Ben Evans is the CEO of jClarity, a startup which delivers performance tools to help development & ops teams. He is an organizer for the LJC (London JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. He is a Java Champion; JavaOne Rockstar; co-author of “The Well-Grounded Java Developer” and a regular public speaker on the Java platform, performance, concurrency, and related topics.

Rate this Article

Adoption
Style

BT