After addressing these peculiar behaviors, Boyer reviews more relevant problems that typical benchmarks might not consider, such as JVM caching and resource reclamations (e.g. garbage collection and object finalization.) He proposes that the only way to effectively avoid these problems is to "warm up" the code until it reaches a steady state. The warm up process can be time consuming and challenging because some JVMs can execute a method 10,000 times before it triggers a compilation (it will remain interpreted until then.) Once the code is in a steady state then the benchmark must run it several times and compute a statistical analysis of the results.
In addition to describing the problem, Boyer proposes the solution of adopting a benchmarking framework, one of which he has written and made available. Using his framework he shows the differences between accessing data contained within data structures (raw arrays, ArrayLists, Vectors, HashMap, TreeMap, and so forth) with varying number of contained elements. Boyer's analysis yields two interesting observations: (1) his benchmarking framework is able to report mean access times when those times are as quick as a few nanoseconds and (2) the behavior of some data structures is surprising under different loads. One peculiarity was in the behavior of the ConcurrentHashMaps when compared to a TreeMap: the CurrentHashMaps performed considerably better that the TreeMap with 1024 elements, but only marginally better with 1024x1024 elements. This is unexpected because hash maps have constant time search whereas trees have log(n) search. Regardless of the unexpected results, the article is worth reading and Boyer's concerns are worth considering when benchmarking Java code.