BT

Ari Zilka on Ehcache BigMemory

by Srini Penchikala on Nov 18, 2010 |

Ehcache BigMemory provides an in-process off-heap cache to store large sets of data closer to the application. Terracotta last week announced the general availability of BigMemory module for Enterprise Ehcache product. BigMemory is part of the standard Ehcache API and is enabled by defining two new attributes on a cache, overflowToOffHeap and maxMemoryOffHeap as shown in the following code snippet.

<cache name="sample-offheap-cache"
    maxElementsInMemory="10000"
    eternal="true"
    memoryStoreEvictionPolicy="LRU"
    overflowToOffHeap="true"
    maxMemoryOffHeap="1G"/>

BigMemory differs from traditional caching solutions in its memory storage strategy. It avoids the Java Virtual Machine (JVM) garbage collection (GC) problems by not storing the data on the Java heap. This extra BigMemory store is referred to as the Off-Heap Store. Traditionally, caching solutions have sought to avoid these issues by distributing the data over a cluster of caching nodes. BigMemory provides a new architectural alternative and allows an application to run on a Java Virtual Machine (JVM) with less than a gigabyte of heap, while using the off-heap memory for faster access to data.

InfoQ caught up with Ari Zilka, CTO of Terracotta, about the new BigMemory feature of Ehcache framework, the use cases where it helps with the application performance and its limitations.

InfoQ: What was the main motivation behind the development of BigMemory feature in Ehcache framework?

The primary motivation was to solve GC issues we were having in the Terracotta server. GC in the server caused variation in response times and in the event a large GC occurred, could cause Cache clients (L1's) to failover to a backup Terracotta server. Once we realized how good the solution was we expanded its use to also include an additional memory store for Ehcache standalone, which became BigMemory, an add-on for Enterprise Ehcache.

InfoQ: Can you discuss the technical details of how Off-Heap store (BigMemory) provides the way to avoid the traditional complexities of Java garbage collection?

BigMemory stores its cache objects outside of the Java heap but still in the Java operating system process. So it is still an in-process cache, with all of the high performance associated with that, but it does not use the heap and therefore allows applications to be configured with very small heaps, thus avoiding GC issues. BigMemory uses DirectByteBuffers, which were introduced into Java in JDK 1.4. All Java implementations can run BigMemory, so everyone can use BigMemory without the need to change JDKs.

We pretty much perform the function of an Operating System memory manager. We then allocate memory on put and free it on remove, something we can do because we are a cache, rather than a general purpose Java program. DirectByteBuffers are slow to allocate, but very fast to use. We therefore grab all the memory we need from the operating system right at startup.

The key to BigMemory, and the thing many people find hardest to understand initially, is how we are able to tell when an object is no longer being used and the associated memory freed. Well for a cache, it is dead simple. A map is basically puts, get and removes. We allocate memory on put (malloc) and we free memory on remove (free). We implement a Memory Manager, which leverages well-understood computer science algorithms combined with our own proprietary enhancements for doing so.

Responding to a question on the best use case where BigMemory helps with the application performance (in terms of read-only, read-mostly, or read-write operations), Ari said that they saw good performance results for both common 90% read / 10% write type uses, and for write heavy 50% read / 50% write type uses. The reason is that the cache is in-process. Read only matters for distributed caches. The hot set can be read much more quickly than the rest, which must be fetched over the network.

InfoQ: What are the limitations of BigMemory solution?

Given the fact that it is pure Java, in-process, and compatible with all common JVMs and containers, it does not have any obvious limitations. We have tested it on the largest memory boxes we could find - with 384GB of memory - and shown that we have linear performance with no noticeable increase in latency all the way up 350GB of BigMemory.

The only constraint we highlight to users is that using an off-heap store imposes the requirement that objects must be serialized to be placed in BigMemory. For the types of data that are normally stored in a cache, this is not a problem.

Once an object has been serialized, it must be deserialized back into the Java heap before it can be used. This does involve a performance overhead. Thus, without garbage collection, BigMemory is slower than on-heap storage. It is however, much faster than the next available tier of storage (be it local disk, network store, or going back to the original system of record - such as an RDBMS - for the data).

It should also be noted that the serialization/deserialization performance overhead is much lower than many users assume. BigMemory is optimized for byte buffers, and has built-in optimizations for objects serialized using standard Java serialization. For example, with our optimizations between alpha and GA release, we were able to double the performance for complex Java objects, and quadruple performance for byte arrays - which is also how the Terracotta Server Array stores data. Using custom serialization can reduce this overhead even further.

InfoQ asked a question about the best practices and gotchas that architects and developers should keep in mind when using BigMemory in their applications. Ari said that all successful business applications face scale limitations. Caching is one of the least disruptive and easiest to implement solutions. The new part is that now this does not necessarily involve a caching cluster.

The Best Practice is to look at your performance architecture with new eyes and see if you can benefit from very large in-process caches. BigMemory lets architects optimize the server and process density to meet their specific needs, rather than being held hostage to the limits of Java.

The biggest gotcha is that most people have already optimized for the limitations of Java. For example, the majority of Ehcache users still run 32 bit JVMs. 32 bit Java has an address space, depending on OS, of 2 to 4GB. So these users have given up on using lots of memory with Java. Chances are they are currently running on hardware with small amounts of RAM. So, if they want to use BigMemory to run 100GB of cache in-process, it probably means new (albeit now cheap) hardware.

InfoQ: What is the future road map of Ehcache framework in general and BigMemory in particular?

Work on the next release of Ehcache and Terracotta, code named Freo (an Australian nickname for Fremantle), is already well underway, with a beta release planned this month. We plan to include a series of feature and performance enhancements. One example is Ehcache Search, which provides Ehcache users the ability to search a cache like you can a database. An alpha release of the code and full documentation of Ehcache Search are already available.

For BigMemory, we are continuing to work on enhancing performance, as well as making a series of practical enhancements, such as more tooling to help people better understand the optimum configuration to set for their use case.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Difficult to see how GC is reduced by Morgan Creighton

Bringing the determinism of malloc and free to Java is fascinating! But, it's difficult to see how Garbage Collection is reduced. Since I have to deserialize data from the off-heap store back into objects, won't those very objects still have to be gc'ed?

Is the gain that a smaller sized heap has shorter gc pauses? So we're improving the responsiveness of the app, but not lessening the total work that gc must do?

It's plain that an in-memory cache enjoys better performance than going to disk. But it's not completely clear to me why the off-heap store is better than a huge heap. I'm intrigued.

Re: Difficult to see how GC is reduced by Vladimir Atehortúa

GC is only reduced if the objects stored in BigMemory are not ultra-active, constantly-mutating objects. If they were, constantly serializing and deserializing them would be way more costly than the cost of GC. Also, like you said, this only makes any sense if only a relatively small sample of those "BigMemory Objects" will be at the heap at any time.

This reduces the use cases where BigMemory actually reduces GC time, but nevertheless, those cases do exist.

There are situations in which the benefit of "having some cached data in memory" is often lost because said cached data is huge, meaning you need a huge heap, which takes eons to be GC'd, because of millions of complex (composite) objects that have to be traversed for reachability again and again.

BigMemory's big save comes from saving the GC from having to traverse those millions of composite objects. However, as you point out, it is only truly beneficial under particular circumstances:

* Probably only works best if the cached objects are treated as immutable.
* At any given time, only a few of those objects can be back live in the heap.
* Any object taken from Bigmemory into the heap, has to have a short lifespan there, and must not be added to any composite long-lived object (i.e.: must never get out of the young generation)

Basically, caches is the right use case for this.
I actually believe that BigMemory might work, but we'd be better served by a new speciallized Garbage Collector (I've had success dealing with "huge heap, large GP pause" problems using the newer concurrent GC).

Re: Difficult to see how GC is reduced by Steve Harris

Good Questions. Let's see if I can take a crack at them (Disclaimer, I work at Terracotta):

" Since I have to deserialize data from the off-heap store back into objects, won't those very objects still have to be gc'ed?"

Java has a generational GC that's is extremely efficient at managing short lived objects. What you'll find is that objects created and destroyed in one cycle of the eden space and never promoted to the next generation lead to virtually no impact on an application.


"Is the gain that a smaller sized heap has shorter gc pauses? So we're improving the responsiveness of the app, but not lessening the total work that gc must do?"

Yes and no. Yes one gets an improved SLA (ability to meant the max latency requirements of the app) by running with a smaller Java heap leading to shorter pauses. But... we are lessoning the load on the kind of GC that is hard. Cleaning up the spaces above eden.

"It's plain that an in-memory cache enjoys better performance than going to disk. But it's not completely clear to me why the off-heap store is better than a huge heap. I'm intrigued."

The big boost is predictability and simplicity of tuning. Most Java applications use a small percentage of the CPU and Memory of the boxes they run on. So it's a trade off between a little extra CPU usage and being able to to keep an entire data set in process. With BigMemory you get an application that runs faster due to the data closeness (much lower latency than going to the DB or doing recalculation of cached data) and you keep pause times under control without ton's of fragile tuning. It has a lot of parallels with how processors have L1 and L2 caches. L1 caches are fastest but L2 caches are also fast: en.wikipedia.org/wiki/CPU_cache#Multi-level_caches

Re: Difficult to see how GC is reduced by Morgan Creighton

Thank you for sharing those insights, Vladimir. I also wonder how BigMemory compares against other GC approaches. Concurrent Mark Sweep is very nice, but I have even higher hopes for the new G1 collector.

Your comments about short lifespan are interesting. I suspect the penalty of constant serialization/deserialization is worth paying precisely for very short lived objects, such as some "long tail" data needed for a single web request.

Re: Difficult to see how GC is reduced by Morgan Creighton

Thanks for your thoughtful and patient response. Objects that don't live long enough to get into survivor spaces don't have to be copied, don't have to have pointers to them rewritten, and so forth. Avoiding that work in garbage collection makes up for the extra work deserialization requires. Brilliant!

I appreciate that tuning is a real pain point, which BigMemory alleviates. But also, it seems one must be careful when designing the application code, to make sure we "let go" of the deserialized objects quickly.

Re: Difficult to see how GC is reduced by Cameron Purdy

Concurrent Mark Sweep is very nice, but I have even higher hopes for the new G1 collector.


With Concurrent Mark and Sweep in 1.6, we suggest a range of 4-8GB for a JVM, and a typical app can avoid noticeable GC pauses. We even have customers running 32GB heaps with CMS with no pauses.

With G1, our expectations are much higher. The technology behind the G1 collector will allow us to push heap sizes over 100GB, and for use cases like caching read-only data the heap sizes should be able to go much larger.

Peace,

Cameron Purdy | Oracle Coherence
coherence.oracle.com/

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

6 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT