BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Discover and Diagnose Java Concurrency Problems Using Contemplate's ThreadSafe

Discover and Diagnose Java Concurrency Problems Using Contemplate's ThreadSafe

Bookmarks

Exploiting the benefits of multicore hardware has proven to be difficult and risky. Writing Java software that correctly and safely makes use of concurrency requires careful thought to take into account the effects of running in a concurrent environment. Software that incorrectly accounts for concurrency can contain intermittent defects that elude even the most rigorous testing regimes.

Static analysis offers a way to detect and fix concurrency defects before programs are ever executed. It accomplishes this by analysing the source code or compiled bytecode of the program to discover latent bugs hidden within code, long before it is executed.

Contemplate's ThreadSafe Solo is a commercial static analysis plugin for Eclipse, designed specifically to discover and help diagnose difficult concurrency bugs lurking within Java programs. By being focussed on concurrency bugs, ThreadSafe can find bugs that other static analysis tools, both commercial and freely available, often miss or are not designed to look for. As far as we have been able to determine, none of the defects in the examples below are caught by any other Java static analysis tool.

In this article I'll introduce ThreadSafe by describing a selection of concurrency bugs, in example and real-world OSS code, and show how ThreadSafe's advanced static analysis and tight Eclipse integration can be used to discover and diagnose these bugs, long before they have a chance to reach production. To try out ThreadSafe on your own code, you can download a free trial copy from the Contemplate website

Most of the concurrency bugs that I use as examples in this article are instances of when developers have not correctly synchronized accesses to shared data. This kind of bug is simultaneously the most common form of concurrency bug in Java code, and one of the most difficult to detect during code review or testing. ThreadSafe can detect many instances of cases when synchronization has been used incorrectly, and, as I show below, also provide the developer with crucial contextual information to help diagnose the problem.

Correct Synchronization Becoming Incorrect Over Time

When designing a class whose instances are intended to be invoked concurrently by multiple threads, developers must put careful thought into how concurrent accesses to the same instance ought to be correctly handled. Even when a good design has been found, it is not easy to ensure that a carefully designed synchronization protocol is respected by future additions to the code. ThreadSafe can help to point out cases when existing synchronization designs are violated by newly written code.

For basic synchronization tasks, Java provides several different facilities, including the synchronized keyword and the more flexible java.util.concurrent.locks package.

As a simple example using Java's in-built synchronization facilities to provide safe concurrent access to a shared resource, consider the following code snippet, implementing a toy "Bank account" class.

public class BankAccount {
protected final Object lock = new Object();
private int balance;
protected int readBalance() {
return balance;
}
protected void adjustBalance(int adjustment) {
balance = balance + adjustment;
}
// ... methods that synchronize on "lock" while calling
// readBalance() or adjustBalance(..)
}

The developer of this class has decided to provide access to the balance field via two internal API methods readBalance() and adjustBalance(). These methods have been given protected visibility so that they may be accessed from sub-classes of BankAccount. Since any particular publicly exposed operation on BankAccount instances may involve a complex sequence of calls to these methods, which ought to be executed as a single atomic step, the internal API methods do not perform any synchronization themselves. Instead, the caller of these methods is expected to synchronize on the object stored in the lock field to ensure mutual exclusion and atomicity of updates to the balance field.

As long as programs are kept small, and the design of a program can be kept in a single developer's head, there is a relatively small risk of concurrency-related problems. However, in any real-world project, the original carefully designed program will be extended to accomodate new functionality, often by engineers new to the project.

Now imagine that some time after the original code was written, another developer writes a sub-class of BankAccount to add some new optional functionality. Unfortunately, the new developer is not necessarily aware of the synchronization design put in place by the previous developer, and does not realise that the methods readBalance() and adjustBalance(..) must not be called without first synchronizing on the object stored in the field lock.

The code that the new engineer writes for a sub-class of BankAccount looks like the following:

public class BonusBankAccount extends BankAccount {

    private final int bonus;

    public BonusBankAccount(int initialBalance, int bonus) {
        super(initialBalance);

        if (bonus < 0)
            throw new IllegalArgumentException("bonus must be >= 0");

        this.bonus = bonus;
}
public void applyBonus() {
adjustBalance(bonus);
}
}

The problem lies in the implementation of the applyBonus() method. In order to correctly adhere to the synchronization policy of the BankAccount class, applyBonus() ought to have synchronized on lock while calling adjustBalance(). Unfortunately, no synchronization is performed and so the author of BonusBankAccount has introduced a serious concurrency defect.

As serious as this defect is, detecting it in testing or even production can be very difficult. This defect will manifest as inconsistent account balances, as updates to the balance field performed by threads are not made visible to other threads due to the lack of synchronization. The program will not crash, but will silently produce incorrect results, in an untraceable way. A particular run of an experiment with four threads that concurrently applied bonuses and credits to the same account lost 11 out of 40,000 transactions when run on quad-core hardware.

ThreadSafe can be used to identify concurrency defects like the example introduced into the BonusBankAccount class. Running ThreadSafe's Eclipse plugin on the two classes above produces the following output:

(Click on the image to enlarge it)

Screenshot of the ThreadSafe view in Eclipse

This screenshot shows that ThreadSafe has discovered that the field balance has been inconsistently synchronised.

To get more context information, we can ask ThreadSafe to show us the accesses to the balance field, and also to show us the locks that were held for each access:

(Click on the image to enlarge it)

Screenshot of ThreadSafe's Accesses View

From this view, we can easily see that the accesses to the balance field in the adjustBalance() method are inconsistently synchronized. Using Eclipse's call hierarchy view (here accessible by right clicking on the adjustBalance() line in the view), we can see where the offending code path comes from.

Screenshot of Eclipse's call hierarchy showing the BonusBankAccount call to adjustBalance

Incorrect Synchronization While Accessing Collections

The BankAccount class above demonstrates a very simple case of incorrect synchronization when accessing fields. Of course, most Java objects are composed of other objects, often in the form of collections of objects. Java provides a rich assortment of collection classes, each with its own requirements on whether or not synchronization is required for concurrent access to the collection.

Inconsistent synchronization on collections can be particularly harmful to program behaviour. While incorrectly synchronizing accesses to a field may "only" result in missed updates or stale information, incorrectly synchronizing accesses to collections that have not been designed for concurrent use can lead to violations of the collections' internal invariants. Violation of a collection's internal invariants may not immediately cause visible effects, but may cause odd behaviour, including infinite loops or corrupted data, at a later point in the program's execution.

An example of inconsistent use of synchronization when accessing a shared collection is present in Apache JMeter, the popular open source tool for testing application performance under load. Running ThreadSafe on version 2.10 of Apache JMeter produces the following warning:

Screenshot of the inconsistent synchronisation warning on the collection stored in the field RespTimeGraphVisualizer.internalList : List<RespTimeGraphDataBean>

As before, we can ask ThreadSafe for more information on this report by showing us the accesses to this field along with the locks that are held:

(Click on the image to enlarge it)

Screenshot of ThreadSafe's Accesses View investigating internalList

Now we can see that there are three methods that access the collection stored in the field internalList. One of these methods is actionPerformed, which will be invoked by the Swing Gui framework on the UI thread.

Another method that accesses the collection stored in internalList is add(). Again, by investigating the possible callers of this method, we can see that it is indeed called from the run() method of a thread that is not the main UI Thread of the application, indicating that synchronization ought to have been used.

Screenshot of Eclipse's call hierarchy showing the run() method

Missing Synchronization while using the Android Framework

The concurrent environment in which an application may run is almost never under the complete control of the application developer. Frameworks invoke various parts of applications in response to user, network, or other external events, and often have implicit requirements about which methods may be invoked on which threads.

An example of the incorrect use of frameworks is visible in recent Git versions of the K9Mail email client for Android (we provide a link to the precise version tested at the end of this article). Running ThreadSafe on K9Mail brings up the following warning, indicating that the field mDraftId appears to be accessed from an Android background thread, and another thread, with no synchronization.

ThreadSafe's report on unsynchronized accesses from an asynchronous callback method

Using ThreadSafe's accesses view, we can see that the field mDraftId has been accessed from a doInBackground method.

(Click on the image to enlarge it)

ThreadSafe's Accesses View showing information about individual accesses to mDraftId

The doInBackground method is part of the Android framework's AsyncTask facilities for running time-consuming tasks in the background separately from the main UI thread. Correct use of AsyncTask.doInBackground(..) can ensure that Android applications remain responsive to user input, but care must be taken to ensure that interaction between the background thread and the main UI thread is correctly synchronized.

Investigating further, using Eclipse's call hierarchy feature, we discover that the onDiscard() method, which also accesses the mDraftId field, is called by the onBackPressed() method. This method is in turn always invoked by the Android framework on the main UI thread, not the background thread for running AsyncTasks, indicating the presence of a potential concurrency defect.

Incorrect use of Synchronised Data Structures

For relatively simple scenarios, Java's built-in synchronized collections facility can provide the right level of thread safety without too much effort.

Synchronized collections wrap existing collections, providing the same interface as the underlying collection, but synchronizing all accesses on the synchronized collection instance. Synchronized collections are created by using calls to special static methods similar to the following:

private List<X> threadSafeList =
         Collections.synchronizedList(new LinkedList<X>());

Synchronized collections are relatively easy to use compared to some other thread-safe data structures, but there are still subtleties involved in their use. A common mistake with synchronized collections is to iterate over them without first synchronizing on the collection itself. Since exclusive access to the collection has not been enforced, it is possible for the collection to be modified by other threads while iteration over its elements takes place. This may result in ConcurrentModificationExceptions being thrown intermittently at runtime, or non-deterministic behaviour, depending on the exact scheduling of threads. The requirement to synchronize is clearly documented in the JDK API documentation

It is imperative that the user manually synchronize on the returned list when iterating over it:
List list = Collections.synchronizedList(new ArrayList());
   ...
synchronized (list) {
Iterator i = list.iterator(); // Must be in synchronized block while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic behavior.

Nevertheless, it is easy to forget to synchronize when iterating over a synchronized collection, especially because they have exactly the same interface as normal, unsynchronized, collections.

An example of this kind of mistake can be found in version 2.10 of Apache JMeter. ThreadSafe reports the following instance of "Unsafe iteration over a synchronized collection":

Screenshot of ThreadSafe's unsafe iteration report

The line reported by ThreadSafe contains the following code:

Iterator<Map.Entry<String, JMeterProperty>> iter = propMap.entrySet().iterator();

Here, the iteration is over a view on the synchronized collection, created by the call to entrySet(). Since views on collections are "live", this code has the same potential for non-deterministic behaviour and ConcurrentModificationExceptions as discussed above.

Conclusions

I have presented a small selection of concurrency defects commonly found in real-world Java programs, and shown how Contemplate's ThreadSafe can be used to discover and diagnose them.

In general, static analysis tools offer a promising way to mitigate against the risk of defects lurking within existing, and new, Java code. Static analysis complements traditional software quality techniques such as testing and code review by providing a quick and repeatable way of scanning code for well-known but subtle and serious bugs. Concurrency bugs in particular are extremely difficult to reliably find by testing, due to their dependence on the non-deterministic scheduling of concurrent threads.

ThreadSafe is capable of discovering a range of other concurrency defects, including atomicity errors arising from incorrect use of the concurrent collections framework, and potential deadlocks from the misuse of blocking methods. The ThreadSafe technical briefing and demonstration video provide further examples of the subtle but potentially catastrophic defects that ThreadSafe can detect.

Resources

About the Author

Robert Atkey is a Senior Software Engineer at Contemplate Ltd., where he architects, develops and maintains the core analysis framework. He graduated from Edinburgh University in 2006 with a PhD in Computer Science, and has since worked on practical and theoretical aspects of software, in both academia and industry.

Rate this Article

Adoption
Style

BT