InfoQ

News

Java 6 Hotspot Performance

Posted by Charles Humble on May 07, 2008 04:05 PM

Community
Java
Topics
Performance & Scalability ,
Compilers
Using the JDK6 u10 b14 debug build, Sun Microsystems' Kohsuke Kawaguchi examined the assembly code produced by the Hotspot JIT in a recent blog post. The post highlights just how far Java optimization has come.

Kawaguchi focuses on two main areas. The first is loop unrolling, a technique where the instructions called in each iteration of the loop are replicated to form a single sequence. This saves time by reducing the number of overhead instructions that the computer has to execute in a loop. JIT combines this with a warm up and cool down and Kawaguchi's commentary highlights the fact that the compiler has removed a redundant array index check from the fast part of the loop. Moreover the resulting assembly code demonstrates how much processor specific optimization is going on. Kawaguchi notes, for instance, that the following code:

private static byte[] foo() {
byte[] buf = new byte[256];
for( int i=0; i<buf.length; i++ )
buf[i] = 0;
return buf;
}

results in assembly using the R8-R15 general purpose registers specific to the AMD64 chip.

The second area Kawaguchi's blog examines is the optimizations that have been made around locks. Whilst uncontended lock acquisition has been steadily improving in Java for some time, contended acquisition has remained problematic. Work in this area is still on-going but Kawaguchi's work does highlight several areas that have improved.

The article shows a number of other features of the Hotspot compiler, including how aggressive the inlining is - James Gosling notes in a corresponding blog post that "even storage allocation & initialization gets inlined". Part of the reason this level of aggression is possible is that the JVM is able to make potentially unsafe optimizations where necessary. Charles Nutter provides a good explanation of this in a post on attending the Lang .NET conference from earlier in the year. His post also highlights the relevance of this work to JRuby, and by implication any language which targets the JVM.

"The JVM has over time had varying capabilities to dynamically optimize and reoptimize code...and perhaps most importantly, to dynamically *deoptimize* code when necessary. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations--potentially unsafe optimizations in the uncertain future of an entire application--knowing you'll be able to fall back on a tried and true safe path later on. So you can do things like inlining entire call paths once you've seen the same path a few times. You can do things like omitting synchronization guards until it becomes obvious they're needed. And you can change the set of optimizations applied after the fact...in essence, you can safely be "wrong" and learn from your mistakes at runtime. This is the key reason why Java now surpasses C and C++ in specific handcrafted benchmarks and why it should eventually be able to exceed C and C++ in almost all benchmarks. And it's a key reason why we're able to get acceptable performance out of JRuby having done far less work than Microsoft has done on IronPython and the DLR."

It has always been theoretically possible that an interpreted language like Java would ultimately surpass the performance of a complied language, since it is able to make optimizations at run-time based on the available hardware, and the increasing use of processor specific optimization in Java is a particularly exciting development in this regard. For the developer targeting the Java platform a significant plus point is that with each new release of the Java compiler code performance improves without any changes needing to be made to the application's source.

corrected listing... by Mirko Jahn Posted May 8, 2008 4:29 AM
Re: corrected listing... by Charles Humble Posted May 8, 2008 1:27 PM
  1. Back to top

    corrected listing...

    May 8, 2008 4:29 AM by Mirko Jahn

    As you can see in the mentioned post, the correct syntax for the method should look like the following:

    private static byte[] foo() {
        byte[] buf = new byte[256];
        for( int i=0; i<buf.length; i++ )
            buf[i] = 0;
        return buf;
    }
    
    (just a problem with the interpretation of the < sign in the post.)

  2. Back to top

    Re: corrected listing...

    May 8, 2008 1:27 PM by Charles Humble

    Many thanks for pointing this - have corrected the post.
    Charles

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.