BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News New Renaissance Performance Benchmark Aims to Compare JVMs

New Renaissance Performance Benchmark Aims to Compare JVMs

This item in japanese

Lire ce contenu en français

Bookmarks

Charles University, Oracle Labs, and several other university researchers have released Renaissance, a new benchmark for performance testing of Java Virtual Machines. This defined benchmark enables developers working on the JVM to measure performance between releases to better understand how applications will perform with this particular JVM.

Renaissance aims to perform more testing of the concurrency features released in Java 8 (2014) through Java 12 (2019). These tests expand upon many tests used by other benchmarks, such as DaCapo, and SpecJVM2008. Overall there are 21 parallel and concurrency-oriented benchmarks covering both Java and Scala code. One other popular tests for benchmarking JVMs is SpecJBB2015, which IBM uses to demonstrate the impact of hardware changes on Java performance.

Five of twelve authors on the Renaissance whitepaper work on GraalVM with Oracle Labs. GraalVM is a new polyglot virtual machine designed to run many software languages, including Java, on a single runtime. GraalVM is available in both a community edition and commercial enterprise edition. Each edition features two modes of operation: hotspot mode and native image. HotSpot mode is named after OpenJDK’s HotSpot implementation and is fully compliant with OpenJDK, passing the Java test compatibility kit. GraalVM’s other mode of operations, native image, compiles Java applications into native machine code using a closed-world assumption and does not meet the definition of compatibility for Java SE for a list of reasons, however it can run applications that fit its closed-world assumption.

Nikita Lipsky, senior software engineer with Excelsior, raised the question of native image compatibility as defined by Sun Microsystems in 2004: "One example is that there is a Rule that requires that a product be compatible in 'all configurations'. You can't have a special configuration that you use to pass the tests, but then encourage your customers to use other configurations that are actually subtly incompatible. Yes, someone tried that trick once."

InfoQ communicated with Oracle to verify that the numbers published in the Renaissance benchmark were generated in the compatible HotSpot mode, and were therefore comparable to other JVMs.

Overall, the performance reported by GraalVM community edition was comparable to OpenJDK while GraalVM enterprise edition scored better. In the chart below, higher is better.
Renaissance Results

The release of Renaissance is causing confusion with other Java implementations, who were not involved with or consulted about what the benchmark would measure. Unlike SPEC, who manages an environment where competitive vendors agree on a fair baseline, only one VM vendor, GraalVM, participated in Renaissance. Although seven of the twelve authors are from different universities, the competitive aspect was not present. "Benchmark game is about credibility, and vendor benchmarks are riddled with conflict of interest. New benchmarking suites -- doubly so. New benchmarking suites that show up together with (specially optimized) vendor product results -- triply so," states Aleksey Shipilev, a performance expert with Red Hat who works on the Shenandoah garbage collector.

"The timing of the 'new benchmark' creation is probably not an accident. But there is also nothing wrong with that. Time will tell if this is a real benchmark that can be used to compare JVM performance," explains Gil Tene, CTO of Azul. "A race is not a race until competitors show up and stand at the same starting line. If [Renaissance] is a real benchmark, it will stay the same and not ‘adapt and zig zag,’ and thus allow others to start reproducing stable baseline results, and then start measuring results that actually compare things across implementations."

Shipilev cites the difficulty of simply running the benchmark as-is to define the starting line, "Saying the suite is 'open-source and can be changed', 'the benchmarks are fairly selected' is disingenuous and does not resolve the current problem with it."

Rate this Article

Adoption
Style

BT