Azul Systems To Open Source Significant Technology in Managed Runtime Initiative
More and more production servers are running managed code be it written in Java, .NET, Ruby or Python. Yet the fundamental design of both the commodity hardware and Operating Systems that run today's server load has never been optimised for this sort of work. A familiar problem that this causes is the pauses that occur during Garbage Collection (GC).
For many enterprise architects and developers the trade-offs between having more of a program stored in memory on the one hand, and GC pauses on the other, is a now familiar battle. In Enterprise Java the impact of these pauses is mitigated by distributing programs to be able to keep the heaps smaller, and resorting to other techniques, such as using asynchronous calls to more "batch-like" processing where response time is less of an issue than throughput.
Whilst these sorts of solutions are reasonable as an engineering approach, the result is clearly not ideal. During our conversation Vice President of Technology and CTO for Azul Systems, Gil Tene, drew a parallel between the current situation with cores and the extended and enhanced memory modes workarounds of the DOS era, which were designed to take advantage of memory above the 640KB limit if the PC supported that capability.
Our overall experience and projections show that a single Xeon core can now happily generate a sustained ~0.5GB/sec of new objects when running common enterprise workloads. Since each socket now holds 6-8 such cores, and commodity systems priced at under $20K now hold anywhere from 12 to 48 such cores, this translates to a sustained allocation rate of 5-15GB/sec in modern commodity systems (assuming their CPU capacity is actually being used for useful work).
The above (~0.5GB/sec) is fuzzy, subjective, and anecdotal, and the actual range we see varies (anywhere from 150MB to 1+GB/sec per core) depending on workload. You'll find higher allocation rates on more "transformative" workloads (e.g. message busses, web apps, integration servers, DB-centric and transactional workloads, data caches, etc), and lower allocation rates in numeric-computation workloads (e.g. Monte Carlo simulations, encryption, compression, finite element analysis, etc).
Some supporting data can be found in common industry benchmarks. I'm not a big fan of any of these benchmarks, mostly because they are all built to ignore real-world GC effects. The big blind spot all these benchmarks have in common with regards to GC comes out of necessity - they are all artificially built to survive and ignore GC effects during their timing runs, or they would not be able to fill up a modern server and measure it. By fitting tightly into generational assumptions and avoiding long-term churn of the heap that would cause actual compacting GC events during the timing runs, they are able to sustain the benchmark long enough to get a throughput measurement, and then intentionally ignore the inevitable collapse that happens when a full GC event occurs outside of the contained timing windows. However, as useless as these benchmarks are for deducing sustainable real world application behavior under load, the work they produce can be measured in allocation per OP (which several surveys have done), and that data can be therefore used to project object allocation (and garbage generation) rates on some of the attempted workload.
SPECjbb2005 allocates at a rate of ~0.01MB/OP. A single Intel Xeon core sustains about 70K-80K OPs/sec (according to recent results published by multiple vendors), which translates to a per core allocation rate of 700-800MB/sec.
SPECjAppServer2004 allocates at a rate of ~0.55MB/JOP. A single Intel Xeon core sustains about 430 JOPs/sec, which translates to 236MB/sec.
Where a JOP is the operation metric in the SPECjAppServer2004 benchmark (I guess it stands for Java operation) - see www.spec.org. It is a unit count that means nothing outside of its benchmark, but can be used to compare results within the same benchmark.
These per OP numbers are pulled from here.
Azul Systems solved the Garbage Collection pause problem with direct support for write and read barriers built into their hardware. This, as Dr. Cliff Click explained to InfoQ recently
...allows you to switch to a simpler GC algorithm - and a simple algorithm can be more easily made parallel, scalable, concurrent and robust. We switched algorithms years ago and our GC easily handles heaps (and allocation rates) an order of magnitude larger than any competition.
Tene explained though that even commodity hardware, specifically the latest chips from Intel and AMD, have good support for managed loads, and this in turn enables the Azul GC algorithm to be applied to both processors:
Specifically, for Intel this means chips that include the EPT (Extended Page Table) feature (which first appeared in Intel's Xeon 55xx, and later in Xeon 56xx, 65xx, and 75xx chips) and AMD chips that include NPT (aka AMD-V Nested Paging). These new Virtual Memory architecture features (EPT and NPT) have made supporting our GC algorithm, with its read barriers and high sustained rates of virtual memory mapping changes, [possible] on commodity x86 platforms. Vega processors included a custom read barrier instruction that included bit field checking in reference metadata as well as special virtual memory protection for GC-compacted pages. Our x86-based JVM performs the semantic equivalent of Vega's read barrier operation using multiple x86 instructions, which in conjunction with using the x86 virtual memory subsystem to remap and protect GC-compacted pages achieve the same read barrier effect, and maintain the same algorithmic invariants needed for the Pauseless GC algorithms to work. The "read barrier" set of instructions is emitted by the JIT compilers and efficiently interleaved into the regular instruction stream (plenty of room for them on these modern 4 issue x86-64 core pipelines), and the virtual memory manipulations use new OS APIs that are now needed to keep up with the tremendous virtual memory mapping change rates (which are more than 100x the levels that most OSs can sustain). The good news is that with EPT/NPT and robust translation look-aside buffer (TLB) support available on modern x86-cores, we are able to easily sustain the rates needed to keep up with 10s of GB/sec of allocation rates - it's just the software stack (such as the OS kernels) that need to be improved to deal with these rates - which is where our Managed Runtime Initiative comes in.
The Managed Runtime Initiative aims to take a holistic approach. It focuses on scalability and response time improvements, looking to enhance interfaces across vertical component and system stacks (e.g. runtime, kernel, OS, hyperviser). The project is being seeded with a reference implementation that includes enhancements to OpenJDK (Java version 6) and an enhanced set of loadable Linux kernel objects (LKOs) or modules that expose new functionality and interfaces, both released under GPLV2.
For the Linux kernel Azul are releasing their GC-optimised memory management, policy enforcement capabilities and a new resource scheduler compatible with Red Hat Enterprise Linux 6, Fedora C12, and Suse. For the OpenJDK the release includes a new JIT compiler, their Pauseless Garbage Collector, and their scalable runtime. Azul systems told InfoQ that the combined JVM and Linux enhancements are able to provide 100x improvement in runtime execution, and a 2 orders of magnitude increase in object allocation rate (and supported heap size).
The initiative has received support from James Gosling, the inventor of the Java programming language. From the press release
I'm excited about the Managed Runtime Initiative and the contribution Azul is making to the Open Source community. Managed runtimes have come a long way since the mid 90s. However, the rest of the systems stack has not evolved to meet the needs of these now pervasive application environments. This Initiative will serve to bring new functionality to the systems stack, freeing managed runtimes to continue their growth and evolution.
The decision to open source key parts of their intellectual property is a bold one. Azul Systems is growing rapidly, recently announcing record first quarter bookings with revenue up 64% on the previous quarter. The hope is to get support from partners, ISVs and vendors to move the state of the art forward for other platforms and runtimes such as .NET on Windows, Ruby, and Python. A second goal for Azul is to see a commercial product comprising of optimised Linux and OpenJDK, though this will depend on vendor participation and support.
Todd Montgomery Dec 19, 2014