The term "Heisenbug" will probably not be found in Webster’s list of new or most frequently used words this year. Unfortunately, as software engineers we are all too familiar with this sinister creature. A pun on the Heisenberg Uncertainty Principle - a quantum physics concept that holds that observers affect what they are observing by the mere act of observing it - the term "Heisenbug" is playfully assigned to a computer bug that occurs when least expected in production, but which all attempts to reproduce when the instruments are connected turn up empty. As a result, it is almost impossible to simultaneously reproduce both the underlying circumstances and the bug itself.
There is, however, a foolproof way to produce a heisenbug:
(1) Write a concurrent program and be sure to ignore concurrent concepts such as publication, escape, the Java Memory Model and byte-code reordering.
(2) Test the program thoroughly. (Don’t worry, all tests will pass!)
(3) Release the program to production.
(4) Wait for the next production change freeze and check your inbox. In an instant, you will find a spate of vitriolic emails lambasting both you and your buggy application.
Before addressing what we can do to avoid such heisenbugs, a more fundamental question might be even more appropriate: if programming concurrency is so difficult, why bother to do it at all? In actual fact, there are a number of reasons to bother:
Parallelism - As the number of processors and cores increase, multithreading allows programs to leverage parallelism to ensure that our programs are not overburdening one core while others sit idly by. Even on a single core machine, applications that are not busy performing calculations, perhaps because they are I/O-bound or waiting for some other subsystem, will have free processor cycles. Concurrency enables programs to use those free cycles to optimize performance.
Fairness - If two or more clients are waiting for access to a subsystem, it would be undesirable to have each client wait for the previous client to complete before executing. Concurrency allows our program to assign each client request to a thread, thereby minimizing perceived latency.
Convenience - It is often easier to write a series of small tasks that operate individually rather than to create and coordinate a large program to manage all of those tasks.
These reasons however, do not mitigate the fact that concurrent programming is difficult. If we programmers do not take it fully into account in our applications, we are exposing ourselves to serious heisenbugs. In this article we will introduce ten tips to keep in mind when architecting or developing concurrent applications in Java.
Tip 1 - Inform yourself
I suggest thoroughly reading Java Concurrency in Practice by Brian Goetz. This bestselling classic from 2006 covers Java concurrency from the ground up. The first time I read this book (you will find yourself referring to it again and again) I acquired a pit in my stomach just thinking of all of the infractions I had committed over the years. For a thorough deep dive into all aspects of Java concurrency, based on the book Java Concurrency in Practice and created by Java Specialist Dr. Heinz Kabutz and endorsed by Brian Goetz, think about taking the Concurrency Specialist training.
Tip 2 - Use available expert resources
Use the java.util.concurrent package introduced in Java 5. If you do not have a great deal of experience with all of the components in that package, I recommend you download the self-executing Java Concurrent Animated application from SourceForge which contains a series of animations (written by yours truly) illustrating each of the concurrent package components. The download serves as an interactive catalog to which you can refer when architecting your own concurrency solutions.
When Java first came on the scene at the end of 1995, it was one of the first programming languages that included multithreading as a core language feature. Because concurrency is so difficult, we found some very good programmers writing some really poor concurrent programs. Shortly afterwards, ranking concurrency guru Professor Doug Lea of Oswego State University published his magnum opus, Concurrent Programming in Java. Now in its second edition, this work introduced a catalog of concurrency design patterns that became the foundation of the java.util.concurrent package. Doug Lea taught us that the concurrency code that we might previously have inlined into our classes could be extracted into individual components that could be certified by concurrency specialists, allowing us to concentrate on our program logic without becoming overly encumbered with the vagaries of concurrency. By familiarizing ourselves with that package and using it in our programs, we can substantially reduce the risk of introducing concurrency errors.
Tip 3 - Be aware of pitfalls
Concurrency is not an advanced language feature, so do not think for a moment that your program is immune to thread-safety concerns. All Java programs are multithreaded: not only does the JVM create threads such as the garbage collector, finalizers and shutdown hooks, but other introduce their own threads as well. For example Swing and the Abstract Window Toolkit (AWT) introduce the Event Dispatch Thread. The Remote Method Invocation (RMI) Java application programming interface, as well as so-called Model-View-Controller (MVC) web frameworks such as Struts and Spring MVC introduce a thread per call and are particularly vulnerable. All of these threads can call programming hooks in your code that potentially modify the state of your application. If multiple threads are permitted to access program-state variables in read or write operations without proper synchronization, then your program is faulty. Be aware of this and be proactive in coding for concurrency.
Tip 4 - Take a straightforward approach
Code first for correctness, then correct for performance. Techniques such as encapsulation can make your program thread-safe, but can potentially slow it down. Nonetheless, HotSpot will generally do a good job optimizing these away, so it is good practice to first ensure that your program runs correctly and only worry about performance tweaks later. All too often we find that clever optimizations, that make our code difficult to understand and maintain, just do not produce material savings in time. Avoid such optimizations no matter how elegant or clever they may appear to be. In general, it is advisable to write nonoptimized code and then use a profiler to find the hot spots and bottlenecks and attempt to correct those. Aim for the biggest bottlenecks first, then reprofile; you will frequently find that by correcting one hot spot you have automatically corrected a few of the next most severe ones. It is an order of magnitude easier to write your program correctly from the start than it is to try to retrofit for safety later, so keep it simple and correct and diligently document all concurrency assumptions from the outset. Using concurrency annotations such as @ThreadSafe, @NotThreadSafe, @Immutable and @GuardedBy can be a big help in documenting concurrency assumptions into your code.
Tip 5 - Keep track of atomicity
In concurrent programming, an operation or set of operations is atomic if it appears to the rest of the system to occur in a single instant between its invocation and its response. Atomicity is a guarantee of isolation from concurrent processes. Because Java ensures that 32-bit operations are atomic, setting integers and floats will therefore always ensure "out-of-thin-air" safety at the very least. Out-of-thin-air safety is achieved when multiple threads that concurrently set a value are guaranteed to produce one of those values, rather than some mutated hybrid value "out of thin air". 64-bit operations involving long and double variables, however, offer no such guarantee; the Java-language specification allows one 64-bit set to be treated as two 32-bit sets in a non-atomic way. As a result, when using long and double variables, threads that attempt to modify a variable concurrently might produce unexpected and unpredictable results in that the final value of the variable might be something that was neither set by the first thread nor the second, but rather some combination of bytes set by both. If, for example, thread A is setting the hexadecimal value 1111AAAA and thread B is setting a value 2222BBBB, it is very possible that the resulting value will be a hybrid variation 1111BBBB, which was set by neither. This can easily be corrected by declaring such variables as volatile, which tells the runtime to treat setters atomically. A volatile declaration is not a panacea, however; although it ensures that setters are atomic, further synchronization is required to ensure that other operations on variables are atomic. If, for example, "count" is a long variable, then it is entirely possible that calls to the incrementing function count++ could be lost. This is due to the fact that despite appearing to be a single operation, ++ is in actual fact a combination of the three operations lookup, increment and set. If concurrent threads execute the operations in an inauspicious interleaving sequence, it is possible that the two threads will perform the lookup operation concurrently and obtain the same value, causing the increment operations to yield an identical result, with the consequence that the set operations will collide, producing the same value as well. If "count" is a hit counter, a hit is lost. When multiple threads could concurrently perform check, update and set operations, be sure to perform them in a synchronized block or use a java.util.Lock implementation such as ReentrantLock.
Many programmers are under the erroneous impression that it is only necessary to synchronize when writing - but not when reading - values. This is a misconception. For example, if the former is synchronized but the latter is not, it is very possible that the reader thread will not see the value written by the writer thread. While this might appear to be a bug, it is actually an important feature of Java. Threads often execute on different CPUs or cores and the design of processors is such that moving data from one core to another is not a quick operation. Java is aware of this and consequently allows each thread to abstract a copy of its state by default when the thread is launched; that original state can subsequently be accessed following more recent nonqualified state changes that may have taken place in different threads. Although declaring a variable as volatile will guarantee visibility, it still does not guarantee atomicity. You have both the power (as well as the obligation) to code around this by properly synchronizing where necessary.
Tip 6 - Confine threads
One way to prevent threads from competing with each other for access to shared data, is to not share! If a particular data point is only ever accessed by a single thread, then there is no need to think about additional synchronization. This technique is called "Thread Confinement".
One way to confine threads is to make objects immutable. While immutable objects are more expensive in terms of the number of objects produced, they are sometimes exactly what the doctor ordered in terms of maintenance.
Tip 7 - Respect the Java Memory Model
Be aware of the Java Memory Model contract for variable visibility. The so-called "Program-Order Rule" states that when a variable is set in a thread, it remains visible to that thread from that time forward, independent of any synchronization. When an intrinsic lock on M is acquired by thread A by virtue of its invocation of synchronized(M) and that lock is subsequently released and acquired by thread B, everything that was set by thread A prior to releasing the lock - including (perhaps unexpectedly) those variables that were set before the lock was obtained - will be visible to thread B after it acquires the lock on M. Releasing a lock acts as a kind of memory commit for any values that were visible to the thread. (See illustration below). Note that the new locking components Semaphore and ReentrantLock in java.util.concurrent exhibit the same semantics. Volatile variables also have similar dynamics: when a thread A writes to a volatile variable X, it is analogous to exiting a synchronized block; when thread B reads that variable X, it is analogous to entering a synchronized block on the same variable, meaning everything that was visible to thread A when it wrote to X will be visible to thread B after it reads X.
Tip 8 - Number of threads
Determine how many threads are correct for your application. A formula can be developed to formally perform this calculation with the following variables:
Let T be the ideal number of threads we are aiming to derive
Let C be the number of CPUs
Let X be the percent of utilization used by each process
Let U be the target percent utilization.
If we had only one CPU which was 100% utilized, then we would only want one thread. When utilization is 100%, the rule specifies that the number of threads should equal the number of CPUs, as represented by the formula T=C. If, however, we see that each CPU is only x percent utilized, then we can inflate the number of threads by dividing the number of CPUs by x, with the result that T=C/x. If x is .5, for example, then we can have twice as many threads as CPUs, which would produce 100% utilization on all threads. If we only want to produce a target utilization of U%, then we must multiply by U, so the formula becomes T=UC/x. Finally, if we view a thread as being p% processor-bound - i.e., computing - and n% not-processor-bound - i.e., waiting - then obviously x=p/(n+p). (Note that n%+p%=100%). Substituting these variables into the formula above yields the following result:
T=CU(n+p)/n or equivalently
In order to determine p and n, you can equip your threads with time logging before and after each CPU and non-CPU call, such as I/O- and JDBC-calls, and then analyze the relative times of each. Of course, not all threads are going to display equal metrics, so you have to take the average; this should nevertheless give you a good rule of thumb to start your configuration tweaking. It is important to get this approximately right, because introducing too many threads can actually degrade performance. It is also important to keep the number of threads configurable, so that as you adjust your hardware configuration you can easily tweak the thread counts. Once you have derived a value for the thread count, pass that value to a FixedThreadPoolExecutor. Because the operating system takes care of the context switching between processes, set U% low enough to accommodate other processes. You should only consider your own process for the remaining calculations in this formula, however.
The good news is that there is some wiggle, so if you get the thread count wrong by up to 20% or so, you probably won’t see a major performance impact.
Tip 9 - Develop in server mode
Perform your development in server mode, even if you are developing a client application. The server-mode flag acts as a hint to the runtime environment that it can perform certain byte-code reorderings to achieve performance optimizations. While such compiler optimizations occur more frequently in server mode than in client mode, they can still be found in the latter. Use server mode to expose such optimizations early during the testing process rather than waiting for a production release.
But do test using -server, -client, and -Xcomp (to eliminate runtime profiling, performing maximal optimization beforehand.)
To set server mode, call java with the -server switch. My friend Dr. Heinz Kabutz, of the Java Specialist Newsletter and Java Masters Course fame, points out that some JVMs (such as Apple OSX until recently) in 64-bit mode will ignore the -server switch, so you might also want to include the -showversion switch, which will display the version at the top of the program log, including the client or server mode. If you are using Apple OSX, you can use the -d32 switch to switch to 32-bit mode. In short, test using -showversion -d32 -server and in general check the logs to ensure you are using the Java version you are expecting. Similarly, Dr. Heinz suggests using separate tests switching between the flags -Xcomp (which removes JIT runtime optimizations) and -Xmixed (-Xmixed is the default, which optimizes hot spots) when executing your tests.
Tip 10 - Test concurrency code
We all know how to build unit testers to test the post-conditions of the method calls in our code. This is a difficult routine that we have become accustomed to performing over the years, because we understand the value in time savings it will yield as we gravitate from development to maintenance. The most important aspect of programming we can test, however, is concurrency. This is because it represents the most fragile code and is where we can expect to find the majority of bugs. Ironically, it is in this regard that we tend to be most remiss, primarily because concurrency testers are so difficult to produce. This is attributable to the nondeterministic nature of thread interleaving - it is not generally possible to predict which threads will execute in which order, as these will vary from machine to machine or even from process to process on the same machine.
A technique for testing concurrency is to use the concurrency components themselves to produce the tests. To simulate arbitrary thread timings, for example, think about the various possibilities for the sequencing of threads and use scheduled executors to deterministically order the threads in accordance with those possibilities. When testing concurrency, remember that when a thread throws an exception, it will not cause JUnit to fail; JUnit only fails when the main thread encounters an exception, not when any other thread does so. One way to get around this is to use a FutureTask to specify your concurrent task instead of a Runnable. As the FutureTask implementation implements Runnable, it can be submitted to a scheduled Executor. FutureTask can accept as a constructor parameter either a Callable or a Runnable. Callable is similar to Runnable except for two notable differences: Callable returns a value while Runnable returns void and Callable throws the checked ExecutionException while Runnable can only throw unchecked exceptions. ExecutionException has a getCause method that will return the actual exception that triggered it. (Note that if you pass in a Runnable you must also pass in a return object. FutureTask will wrap the Runnable and return object in a Callable internally, so you will still have the benefits of a Callable.) By using FutureTask you can schedule your concurrency code and then have the JUnit main thread call get on the result. The get method is designed to rethrow any exception that was thrown asynchronously by its Callable or Runnable.
There are challenges, however. Suppose you have a method that is expected to block and you want to test that it actually does what is expected. How do you test for that block? How long do you wait before you decide that the method actually did block? FindBugs will locate some concurrency issues, and you might want to consider using a concurrency testing framework. MultithreadedTC co-created by Bill Pugh, co-author of FindBugs, provides an API for specifying and testing all interleaved variations, as well as other concurrency specific features.
One important point to keep in mind when testing - Heisenbugs do not rear their ugly heads on every run, so the result is not a simple yes/no answer. Loop through your concurrency testers many (thousands of) times and get a statistical metric in terms of averages and standard deviations to measure your success.
Brian Goetz has warned that visibility failures will become more prevalent over time, as chip designers move to weaker memory models and increasing core counts make for more interleavings.
Therefore let’s avoid at all costs taking concurrency for granted. Let’s be aware of the inner workings of the Java Memory Model and let’s use java.util.concurrent as much as possible to exterminate heisenbugs in concurrent programs, leaving our inbox free for e-mails containing nothing but outpourings of satisfaction and praise.
About the Author
Recently inducted as an Oracle Java Champion, Victor Grazi has been at Credit Suisse since 2005, where he works in Investment Banking Architecture on platform architecture, and as a technical consultant and Java evangelist. He is also a frequent presenter at technical conferences where he speaks about his first love, Java concurrency, and other Java related topics. Victor is contributor and trainer on the popular "Concurrency Specialist Course" and hosts the "Java Concurrent Animated" open source project on SourceForge. He lives in Brooklyn, NY with his wife and family.
Great tips but is there a better way?
Great article and thank you for the tips. I have found that no matter how much the development community is advised of the perils of concurrent programming we continue to fall into the same traps. Admittedly most of the problems could be avoided by following the tips in your article but unfortunately this often does not happen. I would be interested to know the total
Goetz book is very good but there is a better way
Programming Concurrency on the JVM: Mastering Synchronization, STM, and Actors
by Venkat Subramaniam
It provides many ways to minimize the risks, and the best is to avoid threads completely !
Look at this book is has excellent coverage of classic solutions, and many alternatives solutions that are more modern and much less complex.
It is a must read.
Re: Great tips but is there a better way?
..... the total cost of concurrent programming errors but I suspect it would be big.
I wonder should we be advising developers to avoid using thread based concurrency wherever possible? I suspect most developers could master using an event model and non blocking i/o faster than thread based concurrency. In a lot of cases this would eliminate the need for multi threading.
Admittedly this would not make use of multiple cores or multiple processors but this could be achieved using multiple processes and IPC. Alternatively multiple threads could be used specifically when the developer wants to take advantage of multiple cores or processors. But this could be achieved in a lof of cases without sharing data across threads.
Threads are very useful but I think we would be in better shape if we limited thread use to specialised cases and developers experienced in concurrency.
Re: Great tips but is there a better way?
Look at Akka for example. Simple and efficient, no locks, no threads and distributed if needed.
better way or not - not so relevant right now
IMHO, frameworks like Akka are great progress towards what we all need: a truly distributed platform where the threads can be actually distributed (imagine a JVM that can run on top of a cluster and when you create a thread it can actually be distributed transparently).
Re: better way or not - not so relevant right now
I agree that akka is a great platform for the JVM. It was inspired by Erlang which was created a long time ago by Ericsson. It takes away a lot of danger from the developer by using the Actor model. Essentially this is a type of event model where the events are the receipt of messages. Threads and Actors ( for those not familiar with the actor just think of an object that responds to messages and is guaranteed to be thread safe by the runtime) are decoupled by messages. So methods cannot be invoked directly. This ensures that method calls are non blocking - think fire and forget.
The actor model lends itself well to most problems we attempt to solve for busseses today. This requires developers to think in a different way but is much easier than using threads directly. Threads are still used but the platform ensures that an Actor is only use by one thread at a time so developers can be oblivious to concurrency issues.
However I think there are still a couple of things to be concerned about
1 - the platform only guarantees non blocking invocation on actors. We can still cause problems for ourselves with blocking i/o. This is something that needs to be considered when we design a system.
2 - The remoting mechanism attempts to hide the distributed nature of the system from the developer. Admittedly the platform does a good job of this but there is still the danger that developers will make some of the assumptions described by Peter Deutsch back in the mid 90s in The Eight Fallacies of Distributed Computing.
All that being said, akka is a great platform for any developer given the option to use it.
Re: better way or not - not so relevant right now
There is also a fact that is important, the exchanged message in Akka should be immutable objects to avoid concurency issues, this is a backbone principle of actor models.
As far as the eight fallacies:
The Eight Fallacies of
Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous
Akka does not assume the no 1, It is asynchronous so it does not assume no 2, I do not know about no 3, it has as Erlang mecanism for filure so it does not assume no 4, Akka allows for dynamic topology so no problem either for no 5, for no 6 there is no need for an administrator although there can be one or many, no 7 is a complex issue and not only software related, and for no 8 Akka does not seem to assume network homogeneity.
So I think it was designed with those possible flaw in mind, so they relieve the developper from this burden.
No threads and no synch, makes concurency much easier
Do not use -Xcomp
By forcing compilation before the first run, you are forsing the sophisticated C2 compiler to compile the code without any profiling information. In other words, it wouldn't know which look is hot, which branch is taken into the majority of cases and what classes are going to be loaded as a part of the application startup. Chances are that the generated code would be slightly and in some case significantly worse if you leave it with the default settings.
InfoQ needs message editing
For that matter:
- it wouldn't know which look is hot -> which loop is hot
- by the last sentence I meant that forcing early compilation generates worse code