Keeping Garbage Collection Pauses Short with Growing Heap Sizes: Q&A With Dr. Cliff Click
To achieve their required throughput a growing number of enterprise applications written in Java are having the bulk of their processing moved from database to memory. These sorts of applications are characterized by large amounts of live heap data and considerable thread-level parallelism, and are often run on high-end multi-core processors. A consequence of this is that the strong correlation between heap size and garbage collection pause time is becoming one of the major limitations to Java application scalability, and a great deal of R&D effort is being spent trying to remedy the situation.
For example Java 7, which is expected to ship this year, will include a new garbage collector, Garbage-First, which aims to provide consistent, low pauses over time, largely eliminating the low latency/high-throughput trade-off. In contrast to this software-only approach Azul Systems hardware, built around their own custom 54 core processor which is specifically designed to execute demanding Java applications, has support for write and read barriers built into the processor. InfoQ recently talked to Dr. Cliff Click, Chief JVM Architect at Azul Systems, and former architect and lead developer of the HotSpot Server Compiler, about Azul's approach. We started by asking where Azul's hardware is typically used today:
Anywhere you need reliable low pause times in business critical apps or very large heaps. Very-large-heap apps are things like financial modeling, where you suck into heap 300G of finance data, then run a few hundred CPUs in parallel across the data. We also do very well with Java DB caching, with 10's to 100's of Gigs in the cache.
Low-pause-time apps are typically those involving a human where you want to present a web-page back to the person in a timely fashion. Delays of more than a few seconds typically have the person thinking "the website is down" and going elsewhere or filing complaints. Some big name companies run their web-presence on Azul gear because we can give good (flat) response times on heavy loads. Some typical uses are customer portals, large caches (for both performance and scalability) and web versions of internal business apps (inventory control, 'vacation tool', etc).
InfoQ: As I understand it a key advantage of Azul's hardware is that it has direct support for write and read barriers which allows you to obtain low GC pauses. Is that a fair summary?
Yup! In particular, having a read-barrier allows you to switch to a simpler GC algorithm - and a simple algorithm can be more easily made parallel, scalable, concurrent and robust. We switched algorithms years ago and our GC easily handles heaps (and allocation rates) an order of magnitude larger than any competition.
InfoQ: It is obviously possible to do this just in software. Are there situations when this would be worthwhile?
The academic literature has explored this space fairly well; reported penalties are a slowdown over single-threaded performance of between 10% and 20%. IBM's Metronome hard-real-time collector uses a Brooks-style read barrier and has worked very hard to bring the slowdowns to only 30% over a standard collector... but some of that cost is due to being hard-real-time and not just the read-barrier per-se. IBM does sell Metronome (mostly to the military community I believe).
InfoQ: How does what Azul is doing with GC pauses compare with Oracle's Garbage-First collector or using a Java real-time product?
I think G1 will be interesting ... when it's available. Our GC has been running in production stably and well for 4 years now. I think it's still premature to compare G1 numbers.
Real-time Java products tend to have a bunch of issues that make them not well suited for large business apps - typically the GC is either limited to 4G heaps or a single collector (and sometimes a single mutator thread). The RTSJ spec requires a program re-write to use scoped memory.
InfoQ What do you see as the limits of parallelism in terms of GC - will we always have a portion of GC that is effectively non-parallelizable?
People can always make heaps that are difficult to collect in parallel, but in practice most large heaps have ample parallelism. Other portions of the GC problem can be successfully tackled piece-by-piece; we've been doing this work for years and have an extremely scalable (and parallel) concurrent GC. We can (and sometimes do) usefully run >100 GC threads in parallel.
InfoQ: Do you have any plans to open source the Azul VM (or contribute work back to the OpenJDK project)?
We are always looking to put portions of our work into Open Source, as it makes sense. E.g. our CheckedCollections and LockedCollections catch (or correct) very common programming errors where the standard not-locked Collections classes are being used by multiple threads and one of the threads is writing.
Roy Rapoport Aug 28, 2014