BT

Ready for InfoQ 3.0? Try the new design and let us know what you think!

Getting to Know Graal, the New Java JIT Compiler

| Posted by Ben Evans Follow 35 Followers , reviewed by Victor Grazi Follow 23 Followers on Jul 16, 2018. Estimated reading time: 19 minutes |

Key Takeaways

  • Java's C2 JIT compiler is end-of-life
  • The new JVMCI compiler interface allows new compilers to be plugged in
  • Oracle have developed Graal, a JIT written in Java, as the intended replacement
  • Graal also works standalone and is a major component in a new platform
  • GraalVM is a next-generation polyglot VM that supports many languages (not just those that compile to JVM bytecode)

Oracle's implementation of Java is based on the open-source OpenJDK project, and that includes the HotSpot virtual machine, which has been around since Java 1.3. HotSpot contains two separate JIT compilers, known as C1 and C2 (sometimes called "client" and "server"), and a modern Java installation uses both JIT compilers during normal program execution.

A Java program starts off in interpreted mode. After a bit of execution, frequently called methods are identified and compiled - first using C1 and then, if HotSpot detects an even higher number of calls, the method will be recompiled using C2. This strategy is known as "Tiered Compilation" and is the default approach taken by HotSpot.

For most Java apps, this means that the C2 compiler is one of the most important pieces of the environment, as it produces the heavily optimized machine code that corresponds to the most important parts of the program.

C2 has been enormously successful and can produce code that is competitive with (or faster than) C++, due to runtime optimizations that are not available to an Ahead of Time (AOT) compiler like gcc or the Go compiler.

However, C2 has been delivering diminishing returns in recent years and no major improvements have been implemented in the compiler in the last several years. Not only that, but the code in C2 has become very hard to maintain and extend, and it is very hard for any new engineer to get up to speed with the codebase, which is written in a specific dialect of C++.

In fact, it is widely believed (by companies such as Twitter, and experts such as Cliff Click) that no more major enhancements are possible within the current design. This means that any remaining improvements in C2 will be somewhat marginal.

One of the only areas that has seen improvements in recent releases is the use of more JVM intrinsics, a technique described in the documentation (for the @HotSpotIntrinsicCandidate annotation) like this:

A method is intrinsified if the HotSpot VM replaces the annotated method with hand-written assembly and/or handwritten compiler IR - a compiler intrinsic to improve performance.

When the JVM starts up, the processor it is executing on is probed. This allows the JVM to see exactly what features the CPU has available. It builds a table of intrinsics that are specific to the processor in use. That means that the JVM can take full advantage of the hardware's capabilities.

This is unlike AOT compilation, which has to compile for a generic chip and make conservative assumptions about which features are available, because an AOT-compiled binary will crash if it tries to run instructions that are not supported on the CPU present at runtime.

HotSpot already supports quite a few intrinsics - for example the well-known Compare-And-Swap (CAS) instruction that is used to implement functionality such as atomic integers. On almost all modern processors, this is implemented using a single hardware instruction.

Intrinsics are pre-known to the JVM and depend on being supported by specific features of the operating system or CPU architecture. This makes them platform-specific and not all intrinsics are supported on every platform.

In general, intrinsics should be recognised as point fixes and not general techniques. They have the advantage that they are powerful, lightweight and flexible, but have potentially high development and maintenance costs as they must be supported across multiple architectures.

Therefore, despite the progress being made in intrinsics, for all intents and purposes, C2 has reached the end of its lifecycle and must be replaced.

Oracle recently announced the first release of GraalVM, a research project that may in time lead to a replacement for HotSpot in its entirety.

For Java developers, Graal can be thought of as several separate but connected projects - it is a new JIT compiler for HotSpot, and also a new polyglot virtual machine. We will refer to the JIT compiler as Graal and the new VM as GraalVM.

The overall aim of the Graal effort is a rethinking of how compilation works for Java (and in the case of GraalVM for other languages as well). The basic observation that Graal starts from is very simple:

A (JIT) compiler for Java transforms bytecode to machine code - in Java terms it is just a transformation from a byte[] to another byte[] - so what would happen if the transforming code was written in Java?

It turns out that there are some major advantages to writing a compiler in Java, such as:

  • Much lower barriers to entry for new compiler engineers
  • Memory safety in the compiler
  • Able to leverage the mature Java tooling space for compiler development
  • Much faster prototyping of new compiler features
  • The compiler could be independent of HotSpot
  • The compiler would be capable of compiling itself, to produce a faster, JIT-compiled version of itself

Graal uses the new JVM Compiler Interface (JVMCI, delivered as JEP 243 to plug in to HotSpot, but it can also be used as a major part of GraalVM. The technology is present and shipping today, although in Java 10 it is still very much an experimental technology. The switches to enable the new JIT compiler to be used are:

-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler

This means that there are three different ways that we could run a simple program - either with the regular tiered compilers, or with the JVMCI version of Graal on Java 10, and finally with GraalVM itself.

To see the effect of Graal, let's use a simple example, which is nevertheless long-running enough to see the compiler start up - simple string hashing:

package kathik;

public final class StringHash {

    public static void main(String[] args) {
        StringHash sh = new StringHash();
        sh.run();
    }

    void run() {
        for (int i=1; i<2_000; i++) {
            timeHashing(i, 'x');
        }
    }

    void timeHashing(int length, char c) {
        final StringBuilder sb = new StringBuilder();
        for (int j = 0; j < length  * 1_000_000; j++) {
            sb.append(c);
        }
        final String s = sb.toString();
        final long now = System.nanoTime();
        final int hash = s.hashCode();
        final long duration = System.nanoTime() - now;
        System.out.println("Length: "+ length +" took: "+ duration +" ns");
    }
}

We can execute this code with the PrintCompilation flag set in the usual way to see what methods are compiled (it also provides a baseline to compare against for the Graal runs):

java -XX:+PrintCompilation -cp target/classes/ kathik.StringHash > out.txt

To see the effect of Graal as a compiler running on Java 10:

java -XX:+PrintCompilation \
     -XX:+UnlockExperimentalVMOptions \
     -XX:+EnableJVMCI \
     -XX:+UseJVMCICompiler \
     -cp target/classes/ \
     kathik.StringHash > out-jvmci.txt

and for GraalVM:

java -XX:+PrintCompilation \
     -cp target/classes/ \
     kathik.StringHash > out-graal.txt

These will generate three files of output - which will look something like this when truncated to the output generated by running the first 200 iterations of timeHashing():

$ ls -larth out*
-rw-r--r--  1 ben  staff    18K  4 Jun 13:02 out.txt
-rw-r--r--  1 ben  staff   591K  4 Jun 13:03 out-graal.txt
-rw-r--r--  1 ben  staff   367K  4 Jun 13:03 out-jvmci.txt

As expected, the runs using Graal create a lot more output - this is due to the differences in PrintCompilation output. This should not be at all surprising - the whole point of Graal is that the JIT compiler will be one of the first things to be compiled, and so there will be a lot of JIT compiler warmup in the first few seconds after VM start.

Let's look at some of the early JIT output from the Java 10 run using the Graal compiler (in the usual PrintCompilation format):

$ grep graal out-jvmci.txt | head
    229  293       3       org.graalvm.compiler.hotspot.HotSpotGraalCompilerFactory::adjustCompilationLevelInternal (70 bytes)
    229  294       3       org.graalvm.compiler.hotspot.HotSpotGraalCompilerFactory::checkGraalCompileOnlyFilter (95 bytes)
    231  298       3       org.graalvm.compiler.hotspot.HotSpotGraalCompilerFactory::adjustCompilationLevel (9 bytes)
    353  414   !   1       org.graalvm.compiler.serviceprovider.JDK9Method::invoke (51 bytes)
    354  415       1       org.graalvm.compiler.serviceprovider.JDK9Method::checkAvailability (37 bytes)
    388  440       1       org.graalvm.compiler.hotspot.HotSpotForeignCallLinkageImpl::asJavaType (32 bytes)
    389  441       1       org.graalvm.compiler.hotspot.word.HotSpotWordTypes::isWord (31 bytes)
    389  443       1       org.graalvm.compiler.core.common.spi.ForeignCallDescriptor::getResultType (5 bytes)
    390  445       1       org.graalvm.util.impl.EconomicMapImpl::getHashTableSize (43 bytes)
    390  447       1       org.graalvm.util.impl.EconomicMapImpl::getRawValue (11 bytes)


Small experiments like this should be treated somewhat cautiously. For example, the effects of screen I/O with so much compilation early on may distort warm up performance. Not only that, but over time the buffers allocated for the ever-increasing strings will get so large that they will have to be allocated in the Humongous Regions (special regions reserved by the G1 collector for large objects only) - as both Java 10 and GraalVM use the G1 collector by default. This means that the G1 garbage collection profile will be dominated by G1 Humongous collections after some time, which is not at all a usual circumstance.

Before discussing GraalVM, it's worthing noting that there is one other way in which the Graal compiler can be used in Java 10 - the Ahead-of-Time compiler mode.

Recall that Graal (as a compiler) has been written from scratch as a brand new compiler that conforms to a new clean interface (JVMCI). This design means that Graal can integrate with HotSpot, but is not bound to it.

Rather than using a profile-driven approach to compile only the hot methods, we could consider using Graal to do a total compilation of all methods in an offline mode without executing the code. This is the capability referred to in "Ahead-of-Time Compilation", JEP 295.

Within the HotSpot environment, we can use this to produce a shared object / library (.so on Linux or a .dylib on Mac) like this:

$ jaotc --output libStringHash.dylib kathik/StringHash.class

We can then use the compiled code in future runs:

$ java -XX:AOTLibrary=./libStringHash.dylib kathik.StringHash

This use of Graal has only a single goal - to speed up startup time until the regular Tiered Compilation approach in HotSpot can take over. In absolute terms, on a full-size application, JIT compilation is expected to be able to outperform AOT compiled code in real benchmarks, although the details are dependent on workload.

The AOT compilation technology is still bleeding-edge, and technically is only supported (even experimentally) on linux / x64. For example, when trying to compile the java.base module on Mac, the following errors occur (although a .dylib is still produced):

$ jaotc --output libjava.base.dylib --module java.base
Error: Failed compilation: sun.reflect.misc.Trampoline.invoke(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;: org.graalvm.compiler.java.BytecodeParser$BytecodeParserError: java.lang.Error: Trampoline must not be defined by the bootstrap classloader
       at parsing java.base@10/sun.reflect.misc.Trampoline.invoke(MethodUtil.java:70)
Error: Failed compilation: sun.reflect.misc.Trampoline.<clinit>()V: org.graalvm.compiler.java.BytecodeParser$BytecodeParserError: java.lang.NoClassDefFoundError: Could not initialize class sun.reflect.misc.Trampoline
       at parsing java.base@10/sun.reflect.misc.Trampoline.<clinit>(MethodUtil.java:50)

These errors can be controlled by using a file of compiler directives to exclude certain methods from AOT compilation (see the JEP 295 page for more details).

Despite the compiler errors, we can still try to execute the AOT-compiled base module code alongside the user code, like this:
 

java -XX:+PrintCompilation \
     -XX:AOTLibrary=./libStringHash.dylib,libjava.base.dylib \
     kathik.StringHash

By passing the PrintCompilation we can see how much JIT compilation activity is produced - and it is now almost none at all. Only some truly core methods needed for the initial bootstrap are now JIT-compiled:

   111    1     n 0       java.lang.Object::hashCode (native)  
   115    2     n 0       java.lang.Module::addExportsToAllUnnamed0 (native)   (static)

As a result, we can conclude that our simple Java app is now running in an almost 100% AOT-compiled form.

Turning to GraalVM, let's look at one of the headline features that the platform offers - the ability to fully embed polyglot languages in Java apps running inside GraalVM.

This can be thought of as an equivalent to, or replacement for JSR 223 (Scripting for the Java Platform), but the Graal approach goes much further and deeper than comparable technologies in previous HotSpot capabilities.

The feature relies on GraalVM and the Graal SDK - which is provided as part of the GraalVM default classpath but should be included explicitly in IDE projects, e.g. as:

<dependency>
    <groupId>org.graalvm</groupId>
    <artifactId>graal-sdk</artifactId>
    <version>1.0.0-rc1</version>
</dependency>

The simplest example is a Hello World - let's use the Javascript implementation as GraalVM ships this by default:

import org.graalvm.polyglot.Context;

public class HelloPolyglot {
    public static void main(String[] args) {
        System.out.println("Hello World: Java!");
        Context context = Context.create();
        context.eval("js", "print('Hello World: JavaScript!');");
    }
}

This runs as expected on GraalVM, but trying to run it on top of Java 10, even supplying the Graal SDK, produces this (unsurprising) error:
 

$ java -cp target/classes:$HOME/.m2/repository/org/graalvm/graal-sdk/1.0.0-rc1/graal-sdk-1.0.0-rc1.jar kathik.HelloPolyglot
Hello Java!
Exception in thread "main" java.lang.IllegalStateException: No language and polyglot implementation was found on the classpath. Make sure the truffle-api.jar is on the classpath.
       at org.graalvm.polyglot.Engine$PolyglotInvalid.noPolyglotImplementationFound(Engine.java:548)
       at org.graalvm.polyglot.Engine$PolyglotInvalid.buildEngine(Engine.java:538)
       at org.graalvm.polyglot.Engine$Builder.build(Engine.java:367)
       at org.graalvm.polyglot.Context$Builder.build(Context.java:528)
       at org.graalvm.polyglot.Context.create(Context.java:294)
       at kathik.HelloPolyglot.main(HelloPolyglot.java:8)

This means that Truffle is restricted to run only on GraalVM (at least for the moment).

A form of polyglot capability has existed since Java 6, with the introduction of the Scripting API. It was significantly enhanced in Java 8 with the arrival of Nashorn, the invokedynamic-based implementation of JavaScript.

What sets the technology in GraalVM apart is that the ecosystem now explicitly includes an SDK and supporting tools for implementing multiple languages and having them running as co-equal and interoperable citizens on the underlying VM.

The keys to this step forward are the component called Truffle and a simple, bare-bones VM, SubstrateVM, capable of executing JVM bytecode.

Truffle provides an SDK and tools for creating new language implementations. The general approach is:

  • Start from a language grammar
  • Apply a parser generator (e.g. Coco/R)
  • Use Maven to build an interpreter and simple language runtime
  • Run the resulting language implementation on top of GraalVM
  • Wait for Graal (in JIT mode) to kick in to automatically enhance performance the new language
  • [Optional] Use Graal in AOT mode to compile the interpreter to a native launcher

Out of the box, GraalVM ships with JVM bytecode, JavaScript and LLVM support. If we try to call another language, such as Ruby, like this:

context.eval("ruby", "puts \"Hello World: Ruby\"");

then GraalVM throws a runtime exception:

Exception in thread "main" java.lang.IllegalStateException: A language with id 'ruby' is not installed. Installed languages are: [js, llvm].
       at com.oracle.truffle.api.vm.PolyglotEngineImpl.requirePublicLanguage(PolyglotEngineImpl.java:559)
       at com.oracle.truffle.api.vm.PolyglotContextImpl.requirePublicLanguage(PolyglotContextImpl.java:738)
       at com.oracle.truffle.api.vm.PolyglotContextImpl.eval(PolyglotContextImpl.java:715)
       at org.graalvm.polyglot.Context.eval(Context.java:311)
       at org.graalvm.polyglot.Context.eval(Context.java:336)
       at kathik.HelloPolyglot.main(HelloPolyglot.java:10)

To use the (currently still beta) Truffle version of Ruby (or another language), we need to download and install it. For Graal version RC1 (soon to be replaced by RC2), this is achieved by:

gu -v install -c org.graalvm.ruby

Note that this will require a sudo if GraalVM has been installed system-wide as a standard $JAVA_HOME for multiple users. If using the non-OSS EE edition of GraalVM (the only one currently available for Mac), then this can be taken one step further - and the Truffle interpreter can be converted into native code.

Rebuilding the native image (launcher) for the language will improve performance, but this requires using the command line tools, like this (assuming GraalVM was installed system-wide, and so needs root):

$ cd $JAVA_HOME
$ sudo jre/lib/svm/bin/rebuild-images ruby

 

This is still in development, and has a few manual steps, but the development team is hoping to make the process smoother over time.

If any problems are encountered with rebuilding the native components, not to worry - it should still work without rebuilding native images.

Let's see a more complex example for polyglot coding:

Context context = Context.newBuilder().allowAllAccess(true).build();
Value sayHello = context.eval("ruby",
        "class HelloWorld\n" +
        "   def hello(name)\n" +
        "      \"Hello #{name}\"\n" +
        "   end\n" +
        "end\n" +
        "hi = HelloWorld.new\n" +
        "hi.hello(\"Ruby\")\n");
String rubySays = sayHello.as(String.class);
Value jsFunc = context.eval("js",
        "function(x) print('Hello World: JavaScript with '+ x +'!');");
jsFunc.execute(rubySays);

This code is a little hard to read, but it uses both TruffleRuby and JavaScript. First, we call this bit of Ruby code:

class HelloWorld
   def hello(name)
      "Hello #{name}"
   end
end

hi = HelloWorld.new
hi.hello("Ruby")

This creates a new Ruby class, defines a method on it, and then instantiates a Ruby object and finally calls the hello() method on it. This method returns a (Ruby) string, which is coerced to a Java string back in the Java runtime.

We then create a simple JavaScript anonymous function, which looks like this:

function(x) print('Hello World: JavaScript with '+ x +'!');

We call this function via execute() and pass in the result of our Ruby call into the function, which prints it out, from within the JS runtime.

Note that when we created the Context object, we needed to allow extended access to the context. This is for Ruby - and we didn't need it for JS- hence the more complex construction during setup. This is a limitation of the current Ruby implementation, and may be removed in future.

Let's look at one final polyglot example, to see how far we can take this:

Value sayHello = context.eval("ruby",
        "class HelloWorld\n" +
        "   def hello(name)\n" +
        "      \"Hello Ruby: #{name}\"\n" +
        "   end\n" +
        "end\n" +
        "hi = HelloWorld.new\n" +
        "hi");
Value jsFunc = context.eval("js",
        "function(x) print('Hello World: JS with '+ x.hello('Cross-call') +'!');");
jsFunc.execute(sayHello);

In this version, we're returning an actual Ruby object, not just a String. Not only that, but we're not coercing it to any Java type, and instead are passing it straight to this JS function:

function(x) print('Hello World: JS with '+ x.hello('Cross-call') +'!');

It works, and produces the expected output:

Hello World: Java!
Hello World: JS with Hello Ruby: Cross-call!

This means that the JS runtime can call a foreign method on an object in a separate runtime, with seamless type conversion (at least for simple cases).

This ability to have fungibility across languages that have very different semantics and type systems has been discussed among JVM engineers for a very long time (at least 10 years), and with the arrival of GraalVM it has taken a very significant step towards the mainstream.

Let's have a quick look at how these foreign objects are represented in GraalVM, by using this bit of JS to just print out the incoming Ruby object:

function(x) print('Hello World: JS with '+ x +'!');

This outputs the following (or similar):

Hello World: JS with foreign {is_a?: DynamicObject@540a903b<Method>, extend: DynamicObject@238acd0b<Method>, protected_methods: DynamicObject@34e20e6b<Method>, public_methods: DynamicObject@15ac59c2<Method>, ...}!

showing that the foreign object is represented as a bag of DynamicObject objects, which will delegate the semantic operations, in many cases back to the home runtime for the object.

To conclude this article, we should say a word about benchmarks and licensing. It must be clearly understood that despite the enormous promise of Graal and GraalVM, it currently is still early stage / experimental technology.

It is not yet optimized or productionized for general-purpose use cases, and it will take time to reach parity with HotSpot / C2. Microbenchmarks are also often misleading - they can point the way in some circumstances, but in the end only user-level benchmarks of entire production applications matters for performance analysis.

One way to think about this is that C2 is essentially a local maximum of performance and is at the end of its design lifetime. Graal gives us the opportunity to break out of that local maximum and move to a new, better region - and potentially rewrite a lot of what we thought we knew about VM design and compilers along the way. It's still immature tech though - and it is very unlikely to be fully mainstream for several more years.

This means that any performance tests undertaken today should therefore be analysed with real caution. Comparative performance tests (especially HotSpot+C2 vs GraalVM) are comparing apples to oranges - a mature, production-grade runtime vs a very early stage experimental one.

It also needs to be pointed out that the licensing regime for GraalVM might well be different to any seen so far. When Oracle bought Sun, they acquired HotSpot as an existing and very mature product, licensed as Free Software. There were limited attempts to add value and monetize on top of the HotSpot core product - e.g. the UnlockCommercialFeatures switch. With the retirement of these features (e.g. the open-sourcing of Mission Control) then it's fair to say that model was not a huge commercial success.

Graal is different - it started life as an Oracle Research project that is now moving towards a production product. Oracle has invested large sums in making Graal a reality - and the individuals and teams needed for the project are in short supply and scarcely cheap. As it is based on different underlying technology, then Oracle is at liberty to use a different commercial model to HotSpot, and to try to monetise GraalVM across a greater range of customers - including those who do not currently pay for the HotSpot runtime. It is even possible that Oracle may decide that some features of GraalVM will only be made available to customers running on Oracle Cloud.

For now, Oracle is shipping a GPL-licensed Community Edition (CE), which is free for dev and production use, and an Enterprise Edition (EE) which is free for dev and evaluation use. Both versions can be downloaded from Oracle's GraalVM site, where further detailed information can also be found.

About the Author

Ben Evans is a co-founder of jClarity, a JVM performance optimization company. He is an organizer for the LJC (London's JUG) and a member of the JCP Executive Committee, helping define standards for the Java ecosystem. Ben is a Java Champion; 3-time JavaOne Rockstar Speaker; author of "The Well-Grounded Java Developer", the new edition of "Java in a Nutshell" and "Optimizing Java" He is a regular speaker on the Java platform, performance, architecture, concurrency, startups and related topics. Ben is sometimes available for speaking, teaching, writing and consultancy engagements - please contact for details.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Small clarification required by Raghavendra Balgi

Thanks Ben for this excellent introduction to Graal. A clarification on the commands use in this post to demonstrate differences between the two options - the usual (java10 way) and running on the graal VM.

The commands look identical in both cases.

java -XX:+PrintCompilation -cp target/classes/ kathik.StringHash > out.txt
java -XX:+PrintCompilation \
-cp target/classes/ \
kathik.StringHash > out-graal.txt

I haven't downloaded graal yet, but I'm assuming that "java" part of the command refers to different binaries?

Interesting read by Peter Veentjer

Very interesting and informative read. It will be interesting how Graal is going to match against Azul Falcon/LLVM compiler.

Re: Small clarification required by Ben Evans

Hi Raghavendra - thanks for the feedback.

Yes - the binary is just called 'java' in all cases. Personally, I manage my Java versions via having multiple installs in /Library/Java/JavaVirtualMachines (I currently have java7, java8, java9, java10, graal and valhalla as symlinks to various builds) and then just use a .bash_profile to set my $JAVA_HOME at startup. A bit crude, but it does work!

A real bummer that oracle is being this by Pascal Chouinard

This is why we cant run it for free on Mac...

We have to get the paid enterprise version for Mac OS

*thumbsdown*

Re: A real bummer that oracle is being this by Ben Evans

Not true. The above examples were generated on a Mac, and I didn't pay a penny - I just downloaded the enterprise edition & agreed to the license. Technically, that's the same as people have always done for OracleJDK binary (which is not F/OSS at all).

I also hear rumours that CE is potentially coming to Mac too...

Re: A real bummer that oracle is being this by Fred Curts

CE for macOS is coming soon, and until it arrives, it can be built from source.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

6 Discuss
BT