Article: Do Java 6 threading optimizations actually work?

| by Floyd Marinescu Follow 38 Followers on Jun 19, 2008. Estimated reading time: less than one minute |
Much attention has been given by Sun, IBM, BEA and others to optimize lock management and synchronization in their respective Java 6 virtual machine offerings. Features like biased locking, lock coarsening, lock elision by escape analysis and adaptive spin locking are all designed to increase concurrency by allowing more effective sharing amongst application threads. As sophisticated and interesting as each of these features are, the question is; will they actually make good on these promises? In this two part article  Jeroen Borgers explores these features and attempts to answer the performance question with the aid of a single threaded benchmark.

Read Do Java 6 threading optimizations actually work?

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Lock elision only in Java 7 by Christoph Kutzinski

I remember reading something that lock elision is only coming in Java 7. Java 6 is just doing the escape analysis without taking advantage of the analysis results.
(I just found this article where this is mentioned, too).

Can anyone confirm this/give more details?

Premature conclusions.... by William Louth

Have you considered running the various benchmarks within the SPECjvm2008 benchmark suite with the options above? It might save you time in determining whether your numbers are correct and meaningful in a more realistic context.


Re: Lock elision only in Java 7 by Ben Loud

If you look at the OpenJDK HotSpot mailing list, you can see that the Escape analysis has been heavily worked on (eg It looks like it has been signifiantly improved, and even cooler is that its now being used to elimiate allocations entirely. I expect things to be MUCH better in JDK7 (and probably also in a future JDK6 update)

Re: Premature conclusions.... by William Louth

By the way if you are going to construct a micro-benchmark then at least try not to measure other things such as the garbage collector and the memory allocator.

Re: Premature conclusions.... by Kirk Pepperdine

Hi William,

Thanks for the comments on the article. I think if you read the end of the article it gives a hint that there *is* something else going on. Stay tuned for part duex ;-)


Re: Premature conclusions.... by William Louth

Thanks Kirk for responding. By the way I do not like the fact that the classes tested share a common super class especially with micro benchmarks. I prefer for code to be duplicated across test classes. Again lets have these figures for SPECjvm2008 (freely available) to see how it stacks up in the wild.

Microbenchmark by Michael Bien

I am wondering why both loops haven't been optimized away completely ;) The result is never used and both methods have no "side effects". I always store the results in arrays in my own benchmarks because they spent sometimes no time after warming up.

little bit off topic:
If escape analysis really work that good, you also get a profile of the object life cycle for free. Couldn't this be used for fancy GC optimizations? I mean, why putting the not escaping objects in the heap if you already know how long you plan to reference them... (-> stack allocation?)

great article!

Re: Premature conclusions.... by Dieter Guendisch

William is absolutely right. Putting two benchmarks into one single class is useless as you (or at least I :) never know in which "state" the jvm is after running the first micro-benchmark.
Btw. I just tried to switch the order of the micro-benchmarks: LockTest2 which just runs the StringBuilder first and the StringBuffer benchmark second with the following result: while the StringBuilder execution time was stable (LockTest and LockTest2 each took about 5secs for StringBuilder) the outcome of StringBuffer was completely different: LockTest always took about 7secs, while LockTest2 took always about 13secs for StringBuffer!
That said, it would be interesting to see your benchmarks with the "loop" over these two micro-benches being implemented in some batch-script and not the same java program :)

Anyway, very interesting article, thanks for it !!!

Re: Microbenchmark by William Louth

Hi Michael it could never effectively optimize away the loops because constructors and method calls typically result in other indirect calls being made as well as static field initialization and class loading events. The object reference might be discarded but the object construction is rarely as simple as some primitive operation even in the case of these basic Java classes. There are other (temporal) object state issues that would have to considered. In our mind the model in the seems futile and localized but there are side effects which must be represented in terms of the runtime behavior that we ignore when viewing at face value the code.

Re: Premature conclusions.... by William Louth

Hi Dieter, the change in the timing for the stringbuffer could be caused a number of things including the garabage collection of stringbuilder objects created prior to the tests execution.

I am still not sure of the point in presenting such inaccurate figures especially as someone might read this article online and not the correction coming. The article content is good but the benchmark reporting and the final tease at the end is distracting and weakens the rest of the content.

Re: Microbenchmark by Michael Bien

Thank you for your answer William,

I know I oversimplified a lot and I have no experience to know what is possible in real time optimizations and what not.

But I am pretty sure if the Hotspot compiler can't do that because of profiling vs. gain issues you could do that in the bytecode compiler. In worst case a post processor ANT task could insert a thread local Stack into the bytecode in such cases.

public static Vector3f average(Vector3f v1, Vector3f v2) {
Vector3f tmp = new Vector3f(); // candidate for stack allocation
tmp.add(v1, v2);
return tmp;

You are right the main problem is to distinct value objects from objects which have side effects but I think this is possible. The first step are method local escape analysis, the next steps would be to determine if add(), scale(), static{} and object construction have side effects.

(the main goal is to reduce gc stops not to increase execution performance... but both are related)

just a thought

The mistake of the cord by Tajima Kaz

Possibly, the following cord, does one line fall out?

public static String concatToBuffer(StringBuffer sb, String s1, String s2, String s3) {






I think that "return sb.toString();" is necessary.

My post mistake by Tajima Kaz

Sorry, cord -> code

Re: The mistake of the cord by Jeroen Borgers

Thanks Kaz.
Mistakes are easy to make :-)
Will be corrected.


Great article by Leandro Iriarte

Hello Jeroen,

Great article! I would like to put a piece of it in my site (with your name and link obviously), and also I would like to traslate it into spanish (and I can send a copy to you if you want).

Mi site is

Please reply me, I'll be waiting for your auth.


Re: Great article by Jeroen Borgers

Hi Leandro,

Sure, no problem. Please send me a copy.
Sorry for the delay, I was on vacation.


Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

16 Discuss