Dr Cliff Click, a distinguished engineer at Azul Systems, gave a talk (slides here) at this year’s JavaOne describing a set of techniques that have allowed him to get quite some way towards a scalable, non-blocking coding style in Java. In very broad terms his approach implements a non-blocking algorithm such that stopping one particular thread does not stop global progress.
The major components of Click’s work are:
- A large, fast array to hold all the data which allows rapid parallel reads of the data and also allows a parallel, incremental, concurrent copy.
- Atomic-update on those array words (using java.util.concurrent.Atomic.*). The Atomic update will use either Compare and Swap (CAS) if the processor is Azul/Sparc/x86, or Load Linked/Store-conditional (LL/SC) on the IBM platform.
- A Finite State Machine (FSM) built from the atomic update and logically replicated per array word. The FSM supports array resize and is used to control writes.
With these basic elements in place, Click then constructs an algorithm from many FSM steps that is lock free, ie each CAS makes progress. A CAS success is a local success, whilst a CAS failure means another CAS succeeded. If the CAS is successful the state machine advances, whilst a failed CAS retries.
Click has implemented two examples (Bit Vector and Hash Table) with the code available on SourceForge and is working on a third (a FIFO queue). To take one example in slightly more detail, the hash table is an array of Key Value pairs with the keys in even slots and values in odd slots. Each word is Compared and Swept separately but the state machine spans both words and during copy includes words from both arrays. The hash table implementation supports concurrent insert, remove test and resize and passes the Java Compatibility Kit (JCK) for ConcurrentHashMap. On Azul’s hardware it obtains linear scaling to 768 CPUs with more than a billion reads/second simultaneous with more than 10 million updates/second.
InfoQ spoke to Dr. Click to get a little more background on the work. During his JavaOne talk he highlighted a couple of issues with writing the hash table implementation in Java so we asked him how suitable Java was for this kind of work. He responded that it was "actually fairly suitable... it has a well understood memory model (and well implemented). It's lacking in some fine-control issues, which I can ignore as they cost a little bit of performance. This lack of fine-control (i.e. direct ASM access) would be a problem with e.g. OS design or device drivers, but not for performant data structures.”
We also asked when he would recommend using one of his data structures. His general advice was to use it when the "tried and true" alternatives are too slow to be useful:
“When a single data structure is highly contended; and you've already tried e.g. java.util.concurrent.ConcurrentHashMap. My stuff is generally slightly faster with no load (so no real reason to use it), and works much better
- with more than 32 cpus contending, or
- with a high percentage of writes vs reads.
Your mileage may vary, of course, so test before using.”
There's a lot of activity around concurrency in Java at the moment and Dr. Click's work tackles similar problems to the fork/join framework being considered for Java 7. Although not a member of the expert group himself, Click is regularly consulted by them.