The Challenges in Java Benchmarking

Brent Boyer, a programmer at Elliptic Group, published a two-part article series on IBM's DeveloperWorks entitled Robust Java Benchmarking that explores the challenges in capturing effective Java benchmarks. He begins by discussing various JVM idiosyncrasies and optimizations in current compilers that might negatively affect performance testing. For example, he shows that if a complicated section of code computes a value that is never used, that an aggressive compiler might optimize out the section of code and hence the benchmark will not include it. To illustrate this point, one of his code snippets consistently runs in 4.9 seconds on his computer, but if he omits a println that displays the result, it consistently runs in 0.08 seconds. He also points out that the granularity of time measurements varies on different operating systems so benchmarking exercises need to be aware of this when interpreting the results. He shows that System.currentTimeMillis() is not a good measurement of elapsed time (as compared to System.nanoTime()) as it only has a 15ms accuracy on Windows XP (but it does have a 1ms accuracy in Linux with a 2.6 kernel.)

After addressing these peculiar behaviors, Boyer reviews more relevant problems that typical benchmarks might not consider, such as JVM caching and resource reclamations (e.g. garbage collection and object finalization.) He proposes that the only way to effectively avoid these problems is to "warm up" the code until it reaches a steady state. The warm up process can be time consuming and challenging because some JVMs can execute a method 10,000 times before it triggers a compilation (it will remain interpreted until then.) Once the code is in a steady state then the benchmark must run it several times and compute a statistical analysis of the results.

In addition to describing the problem, Boyer proposes the solution of adopting a benchmarking framework, one of which he has written and made available. Using his framework he shows the differences between accessing data contained within data structures (raw arrays, ArrayLists, Vectors, HashMap, TreeMap, and so forth) with varying number of contained elements. Boyer's analysis yields two interesting observations: (1) his benchmarking framework is able to report mean access times when those times are as quick as a few nanoseconds and (2) the behavior of some data structures is surprising under different loads. One peculiarity was in the behavior of the ConcurrentHashMaps when compared to a TreeMap: the CurrentHashMaps performed considerably better that the TreeMap with 1024 elements, but only marginally better with 1024x1024 elements. This is unexpected because hash maps have constant time search whereas trees have log(n) search. Regardless of the unexpected results, the article is worth reading and Boyer's concerns are worth considering when benchmarking Java code.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Java topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter