# Terracotta's BigMemory Aiming to Eliminate Garbage Collection for Java Caches

| by Charles Humble 290 Followers on Sep 14, 2010. Estimated reading time: 4 minutes |

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

Terracotta is the latest vendor to try to address the problem of garbage collection pauses in Java applications. The GC pause problem is particularly pertinent to applications that make heavy use of caching. Many collectors make a generational distinction between old and young objects, handling the younger generation concurrently but falling back to a stop-the-world pause for handling the older generation. By putting more longer lived objects in memory a cache can exaggerate the problems that occur when these long-lived objects have to be managed directly. Terracotta's solution is BigMemory™ for Enterprise Ehcache which uses its own memory management system specifically designed for the product.

"Developers today use time-consuming techniques to address large data sets – for example, when using lots of VMs with small heaps," said Ari Zilka, CTO of Terracotta, in a statement

BigMemory for Enterprise Ehcache makes the black art of GC tuning a thing of the past. Companies can fully utilize the capacity of modern servers to achieve the performance gains of in-memory data while simultaneously consolidating the number of servers in the data centre.

BigMemory can be seen as a competitor to Azul's Zing product, which brings their pauseless garbage collection to Intel and AMD based servers. The two products however take very different approaches. Whilst Azul's solution uses software techniques to provide a garbage collection algorithm which runs concurrently with the application, and therefore requires Azul's JVM, BigMemory aims to reduce the load on the Garbage Collector by managing the data placed in the cache off heap, much as you might with a program written in C. As such, applications that are not currently using a cache code will require code changes, but conversely for an application already using a cache, such as a Hibernate Cache, the JVM does not need to change.

InfoQ spoke to Amit Pandey, Chief Executive Officer at Terracotta, to find out more about the product. Pandey explained that whilst Terracotta's Ehcache supports both single and multi-nodes, around 80-85% of Terracotta's users were using a single node cache. Whilst these customers might not yet feel ready to jump to a fully distributed architecture they do have issues of scale and performance. For these users BigMemory offers an alternative.

"What they are running into is that when they try to expand the size of the data set that they put in memory or on heap, they are running into garbage collection issues and performance issues. So therefore they are restricted to using a fairly small footprint."

Pandey told us that initially Terracotta had needed to solve the GC problem for their own Java-based server and took the decision early this year to develop their own memory manager, still written in Java, which is able to side-step the garbage collector. Having done so they decided to integrate it into the standard Ehcache product and release it for sale in the market. According to Pandey, whilst most customers struggle to get a heap to 4GB or so

For the Java world we're offering the ability to put a lot of their data into Ehcache. We've tested out to well over 100GB and we see a flat line when it comes to response times and SLA and maximum GC pauses, because basically we don’t do GC pauses any more. So if your application is doing a 1 second GC at a 1 GB heap and you put Ehcache in and put things off heap and, still in memory, into Ehcache you can go well over 100GB and GC time remains exactly the same.

The following chart, courtesy of Terracotta, is broadly conceptual but modeled after a real application, and illustrates the point

We went on to discuss a little around how the memory manager works. Pandey told us

...We're not doing garbage collection. We're doing memory management very much the way it would be done in other languages. What we're doing is putting things into our data structure which is a flat data structure. So I don't have all the generational issues and so on and so forth, the app is taking care of those things. So you've got some stuff on heap and that is handled the way it is normally handled, but you can make the heap size very small and then put everything else into our data structure and we handle the management of that. What we've done is some very clever algorithms that take care of how we handle fragmentation and issues like that, and because we're basically doing it off-line we're not slowing the application down.

Whilst the target market for BigMemory is mainly people who are not ready to build a fully distributed architecture, the product does work with the distributed cache as well as the single node.

The product is currently in beta with a GA release expected in October. Pricing information should be available nearer the release date.

Style

## Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

## Get the most out of the InfoQ experience.

### Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

BigFlop: Any DataGrid has it and had it for the last 5 years...

How is this different from what rest of the industry had for ~ 5 years (starting with Coherence, and then GigaSpaces, Infinispan and GridGain)? More specifically, how is this different from any distributed partitioned cache when data is kept on remote nodes and there's obviously no GC induced by cached data and most of the GC happens on young generations anyways.

Furthermore, it's all bolted on Ehcache which is a generation behind on features comparing to something like Coherence or even newcomers like Infinispan and GridGain.

Unless I'm missing something - this all smells like a very awkward informercial of repackaging an existing technology (that Terracotta finally developed too) under a new, extremely confusing name.

I would really like to see a real technical article explaining pros and cons - and not a BS on Ari's blog or "secrecy". I mean, c'mon - this is not a high-school anymore :)

Nikita Ivanov

GridGain = Compute + Data + Cloud

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

This is indeed something new. It is in-process but off-heap. We have just released the technical documentation - ehcache.org/documentation/offheap_store.html

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

> This is indeed something new. It is in-process but off-heap.

I think what Nikita meant was that Coherence had it in 2002, etc.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

Greg,
I get the technicality (near cache with a large array-based storage of serialized data). But I think you are into big surprise if you are alone (or even first) doing it :)

Thanks,
Nikita.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

DirectByteBuffer has existed since Java 1.4. That part is not new. The BigMemory implementation does a lot of clever stuff to provide excellent performance and scalability characteristics within the context of Ehcache. We have tested this on Cisco UCS boxes up to 350GB with excellent results. We have also done extensive competitor comparative testing. We believe, based on our testing, that our BigMemory delivers on the promise of a scalable in-process but off-heap cache storage, whereas the competitors we have tested do not.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

I think the only way to show what the solutions really are worth is to create a set of benchmarks using Coherence/Terracotta/Gridgain etc. Without benchmarks, it will always remain a 'your products sucks and our product is better' mud throwing competition. This gives us some feedback on some technical measurements.

Peter Veentjer
Multiverse: Software Transactional Memory for Java
multiverse.codehaus.org

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

I have to agree with Nikita here, this has been done already, nothing new. This basically solves the technical problem of having a smaller set of "masters" in the terracotta model, with clients actually forming the collocation aspect (collocation in terracotta is possible on the client side, with some low level knowledge by the developer).

Also, you need to serialize and deserialize the objects to and from the cache, and the subcontext here (aside from the performance implications) is that those objects are short lived thus allowing the young GC to do its magic. In actual collocated architecture (as in GigaSpaces) the objects are not fully serialized (but still providing transactional semantics) thus really not suffering from either... .

I also would like to really note that this markering move (md5 of what I had for breakfast...) and titles presented elsewhere is very misleading. I have already read comments from people asking for Terracotta to donate this to Oracle so they will improve SUN GC (or comparing it to Azul). Dear god!.

Terracotta has a different distributed model than most Data Grids out there. It needs to solve some of the problems it creates in different ways, but those solutions do not really apply to Data Grids implemented in different models... . In elasticsearch for example, I have implemented off heap storage of indices from day one ;).

Clarification, I have worked for GigaSpaces. I think Terracotta is a very good technology in certain places, but really, the marketing here is a bit over the top.

-shay.banon

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

As someone how has worked with all products mentioned at customer sites resolving performance & availability issues I find it hard to imagine that GC pauses are ever likely to be addressed by a data/grid/cache product because its not just the longevity of potentially cached objects that causes GC pauses (significant in duration or frequency). Whilst customers might be smart enough in choosing to purchase rather than build their own data/compute grid they fail miserably in the software performance engineering and capacity management of their own code bases and resources which are outside of the scope here. The amount of temporary/transient allocation that occurs in typical Java EE request processing is outstanding and a testament to the quality of engineering in VM's. This is caused in the intermediary processing and transformation of the object domain which may or not be cached in part. In addition memory issues are caused by extreme spikes in concurrent workload that is unregulated (or governed) by resource management policies.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

Shay's points are pretty much in line with mine (don't want to re-post my reply from TSS). I want to avoid the gang-up effect on Terracotta though... It's great to see EHcache maturing, nice feature and optimization for sure. I'm also pretty sure that Terracotta does something new in its implementation.

But again, it solves the problems largely created by Terracotta architecture itself.

Nikita.
GridGain Systems.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

Some of the comments here seem to be focused on various clustering and distributing technologies.

Certainly the the problem of GC can be addressed by using smaller heaps and using jvm stacking distributed computing and/or clustering.

The cool thing about BigMemory is what it does for your UNCLUSTERED nodes. Using Ehcache you can max out a 350 gig box with a tiered restartable cache

OnHeap
OffHeap
Disk

and almost no GC pauses.

I think that's pretty cool and I can think of a number of products that we use every day that I wish had and used it. On top of that we integrated it into our clustering products allowing individual cluster nodes to get Big. Many of our users have been asking for big predictable cluster nodes and this delivers it.

Solving the wrong problem with the wrong solution

It feels to me that were years back in time. Storing data off heap was popular then as memory wasn't available at large capacity and was fairly expensive. So i was wondering, now when we finally have more memory available at larger capacity and lower cost (As i noted here) is it right to move it off heap to just to take advantage of it?

Perhaps the following note by Amit Pandey (TC CEO) explains it:

"Pandey explained that whilst Terracotta's Ehcache supports both single and multi-nodes, around 80-85% of Terracotta's users were using a single node cache. Whilst these customers might not yet feel ready to jump to a fully distributed architecture they do have issues of scale and performance. For these users BigMemory offers an alternative."

This works against what we've seen in GigaSpaces, Coherence and most distributed caching where we actually saw the size of the cluster grows over the years to x100 of nodes where GC can be easily managed by spreading the memory between multiple VM's without real performance impact.

IMO Terracotta needs to first ask itself why their customers are not using them in distributed mode like in most other products rather then solving a symptom of the problem.

"According to Pandey, whilst most customers struggle to get a heap to 4GB"

Like most of the data in this post i found this part to be very far from anything that i'm aware of. Many of our customers have no problem to run x10G on commodity hardware. We released our benchmark on UCS where we were able to manage x100G of data on a single VM and under extreme load (7M read/sec) - (you can read the details here

What's also missing is the performance impact on the application when it start swapping its data off heap (before it start to hit GC) and the graph of that performance as i increase the capacity.

There are many more technical reasons why this is not the right approach to solve GC spikes IMO, but i'm not sure it's worth spending more time on this then i already did unless someone is really interested in that:)

Nati S.
GigaSpaces

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

I'm not refuting any of TC's claims when the application itself relies heavily on data model that has significant potential for such cache storage. I will have to wait and see the public benchmark and then make an assessment of its merit and how applicable it is to real world use cases.

For big data systems we will probably need to revisit previous approaches and architectural choices made. Google appears to be doing just that. Maybe cloud computing requires us to make such data addressable and accessible in new ways. It would be great to see new ideas on how best to address transient/persistent/reference/temporal data management needs (SQL or NoSQL) with the flexibility, delegation, and composition of our own programming data structures.

I am all for benchmarks as long as they are designed to reveal cost & behavioral aspects of implementations. Benchmarks should not necessarily be about crossing the finishing line but as a way to communicate unit costs (overhead, performance, capacity) in a way that could be mapped (to some degree) to a proposed usage and architecture. That said I do like a good benchmark shootout because it does motivate teams to focus on performance much more than has been the case.

Re: Solving the wrong problem with the wrong solution

GC can be easily managed by spreading the memory between multiple VM's without real performance impact.

You should also point out that the effect of GC can just as easily spread like wildfire across an entire cluster if there is coordinated workflow processing (at the app and/or grid level) & concurrent shared data access (locking) - at least that was the case previously with some data grid products and customer applications.

Whilst the grid runtimes have scaled to very large clusters I think we have hit a ceiling in terms of management, control and diagnostics which is partly attributed to inadequate vendor tooling (admittedly its a hard task) and the poor folks tasked with managing such sprawls - blindly.

Re: Solving the wrong problem with the wrong solution

Actually coordinated workflow have the tendency to slow things down and are therefore less likely to generate GC spikes. Map/Reduce type of workflow could be a better candidate on that regard if that is what you where referring to.

Whilst the grid runtimes have scaled to very large clusters I think we have hit a ceiling in terms of management, control and diagnostics which is partly attributed to inadequate vendor tooling (admittedly its a hard task)

Couldn't agree more and this is where we should be running the discussion rather then trying to hack around it and with that open a whole new can of worms.

Re: Solving the wrong problem with the wrong solution

I explained a bunch of what we are talking about and what we have achieved and how we tested over here:
blog.terracottatech.com/2010/09/bigmemory_expla...

Needless to say, I disagree with the other vendors. They have a vested interest in making claims and refuting ours, but the benchmark and product will speak for themselves. I am 100% confident in that (because I spoke to over 100 Java shops in the last 2 weeks from EBay and Facebook to EC2-based web games and more) and all were super excited.

All I think we need to point out here is that most of these comments are coming from Terracotta competitors and that they commented on our supposed short-comings long before Greg Luck ever posted _any_ technical documents on what BigMemory is or how it works. In other words, they were shouting into the wind.

Moving on,

--Ari

Re: Solving the wrong problem with the wrong solution

Ari, with all due respect, you should know that customers aren't likely to go to YOU and say "dude, your thing sucks."

It doesn't suck, of course, but you aren't likely to get a lot of solicited negative response. That's just life.

Personally, I think it's neat, what you've done here - but I do wonder what the long-term benefit is, even for Terracotta. Isn't this pushing the problem of heap from the JVM - which is pretty efficient at GC - to a Java-based memory manager?

Re: Solving the wrong problem with the wrong solution

Ari,
How much will it take to come down from your Ivory Tower to realize that many people actually know how it is done (geez), have done it before and some (not us) have been doing it for years?

This "blissful" ignorance on your part is what rubs people the wrong way. Needless to say that this optimization looks like a bug fix for Terracotta (due to known shortcomings of your chosen architecture).

Best,
Nikita Ivanov.
GridGain Systems.

Re: Solving the wrong problem with the wrong solution

Needless to say, I disagree with the other vendors. They have a vested interest in making claims and refuting ours, but the benchmark and product will speak for themselves. I am 100% confident in that (because I spoke to over 100 Java shops in the last 2 weeks from EBay and Facebook to EC2-based web games and more) and all were super excited.

All I think we need to point out here is that most of these comments are coming from Terracotta competitors and that they commented on our supposed short-comings long before Greg Luck ever posted _any_ technical documents on what BigMemory is or how it works. In other words, they were shouting into the wind.

Paranoid today, are we? .. I'm not sure what claims or refutations you're talking about, other than Nikita pointing out that you are not the first company to implement the off-loading of data from a Java heap using NIO. That doesn't detract from the technical solution, just from your arrogance.

Whatever.

Peace,

Cameron Purdy | Oracle Coherence
coherence.oracle.com/

Comparing TC with Azul?

:)

It's like comparing a Honda Civic with a shiny new turbo kit to a Bugatti Veyron.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

I agree with Peter. Also, benchmarks are useless unless the benchmark framework itself is also open. Not so much as a trust issue, but more so people can run benchmarks themselves on their own hardware with their own data access patterns.

- Manik
Infinispan www.infinispan.org

Re: Solving the wrong problem with the wrong solution

Ari

All I think we need to point out here is that most of these comments are coming from Terracotta competitors and that they commented on our supposed short-comings long before Greg Luck ever posted _any_ technical documents on what BigMemory is or how it works. In other words, they were shouting into the wind.

Maybe the right question should be:

What brought you to release something half backed before Greg Luck published his results?

Even the chart on this post is conceptual.

The following chart, courtesy of Terracotta, is broadly conceptual but modeled after a real application, and illustrates the pointis broadly conceptual but modeled after a real application, and illustrates the point

I can only imagine what would have been your reaction if Oracle, IBM or any other vendor would publish such statements and points to "conceptual charts" to prove their points and throw numbers all over the map without solid explanation behind them.

Nati S.
GigaSpaces

P.S i just posted a comment in your post asking for more clarification.

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

Manik,

The benchmark is coming. The key, however, is to run it under the same conditions which some frameworks may not be able to do...

The numbers we have been quoting are for 335MM objects in 1 JVM @ 350GB (1.3KB values). Although our stuff works clustered and unclustered, the point here is to test the unclustered version so in this particular test, one cannot hide the data on disk or in a datagrid or what have you--it all stays in RAM (in process in our case).

Also, I understand your interest in being able to participate in the discussion and maybe what we can do is get a call together so we can get you the framework and answer your questions directly as you work through running it.

Feel free to email me offline.

Cheers,

--Ari

Re: Solving the wrong problem with the wrong solution

Nati,

Thanks for always being passionate and caring about your product and this space. It is a trait many can respect. And you are right...there are many things other vendors post where I want to get in there and present an alternate point of you. I can empathize with your position here.

Cheers,

--Ari

Re: Solving the wrong problem with the wrong solution

Doh. Freudian slip. Should say "alternative point of view." @Floyd, comments should be editable for more than 90 seconds!

--Ari

BigMemory sounds like a passivation mechanism

imho, BigMemory illustrates/shows JVM lacks of an off-heap (but, in-memory) high-speed (explicit) passivation mechanism.

BigMemory is one proof such a mechanism is doable (and shows the path for the inclusion of such mechanism into the JDK ?).

Dominique
www.jroller.com/dmdevito

Re: Solving the wrong problem with the wrong solution

Hey Joe, doesn't this thread remind you of the TSS days? :)

Re: Solving the wrong problem with the wrong solution

Nikita was right at the beginning of this whole mess. Most products (including ibms extreme scale datagrid) already used some approach for reducing the live object count for better gc times with large #entries in the grid. This looks pretty different than what azul does. And yes, the way Terracota works is very different from us, coherence or giga which makes product comparisons pretty difficult.

Re: Solving the wrong problem with the wrong solution

Thanks Ari
So are you going to post my questions ?

Re: BigFlop: Any DataGrid has it and had it for the last 5 years...

Manik

I hear you. Should be releasing this Friday.

Greg

Releasing Performance Results and Benchmark source tomorrow

I have been busy the past three days working with the team to ready our performance benchmark testing framework for checkout from svn and the benchmark results we have for a variety of common scenarios. Stay tuned, I think we are now done and we will be releasing this tomorrow.

Greg

Re: Solving the wrong problem with the wrong solution

Ari

To your point i tried to get more clarification on some of the number that you quoted in your blog.

Interestingly enough the the comments on your blog are moderated and didn't got published! That in and of itself is worth a separate discussion especially as it comes from an "opensource" company.

Here is a quote from the original comment that didn't got published..

<comment>

Your post refers to so many numbers that requires more clarification IMO.

"64 bit JVM is small (> 2GB) it takes 30+% more RAM than a 32-bit equivalent"

Are you suggesting that your solution wouldn't require 64bit VM to get to 385GB?

I'm also sure that your aware of the Compressed Ooops option that reduces that overhead.

"The pause times (stop-the-world) can be minutes and we have even seen hours in some cases"

I never saw something like that. Is there any test case that could prove that?

"If I read from the cache 7MM times / second I miss the point by doing a read-only test. 7MM reads per second might as well be 7 trillion. reads don't create any garbage or any challenge"

I thought that if i'm using BigMemory then reads would need to get swapped from the of heap storage into my JVM heap - In that case read would end up creating lots of memory allocation and GC spikes. Isn't that the case?

If the 7M refers to the numbers that we published here then the actual numbers are 7.1M for reads and 2.3M for writes. If your tests shows 0.5M writes that's 4 times slower then what we experienced on the same hardware. At this point it is also important to note that for apples to apples comparison GC behavior need to be measured under the SAME throughput level i.e. if i would run 4 times slower the chances that i would hit any GC using regular java GC i,e. without BigMemory extension is significantly lower if not the same.

Anyway i'll stop here as we can go through that forever. My points is not to dismiss your argument about the value of BigMemory, it sounds like it could be useful for Terracotta users who tend to run mostly on Single VM. My point is that if your going to make it a general purpose argument its important to understand the actual meaning of all the data points that your referring to in order that we could run a constructive dialogue that may benefit the entire community. For example maybe there are things that could be done at the JVM level to improve the GC behavior for large memory Cache or reduce the tuning overheard which i'll have to agree is a real pain.

</comment>

Re: Solving the wrong problem with the wrong solution

it sounds like it could be useful for Terracotta users who tend to run mostly on Single VM

Ouch :) But in all fairness, off-heap allocation IS important when dealing with large RAM box. Running multiple VMs in such cases induce serialization/deserialiazation overhead. Now, in cloud/grid environment I rarely see boxes with large RAM - but technique is nonetheless noteworthy.

Nikita Ivanov.
GridGain Systems.

Re: Solving the wrong problem with the wrong solution

As a rule, I don't publish any comments from other vendors on my blog. I also don't comment on vendors' blogs...it is a courtesy to you and your community. If your users want to hear from me, they can come to me is how I see it. Community hijacking or terrorism is something I decided a while ago to try to avoid.

Are you suggesting that your solution wouldn't require 64bit VM to get to 385GB?

no. Not suggesting we use a 32-bit JVM to get to 350GB. We use a 64-bit JVM on a 64-bit OS so that the process can get to that size in the operating system below the JVM. That said, we have tested on 32-bit JVMs and on PAE kernels. The benefit of BigMemory is predictable latency. The standard deviation of request latency during our test was zero. We have now been running a customer's app for over 7 days and it has yet to Full pause. It is a 50/50 read write use case and its value graphs are nearly 5KBytes when being written. I am very happy with this predictability / determinism. That's the value of BigMemory IMHO.

If the 7M refers to the numbers that we published here then the actual numbers are 7.1M for reads and 2.3M for writes. If your tests shows 0.5M writes that's 4 times slower then what we experienced on the same hardware. At this point it is also important to note that for apples to apples comparison GC behavior need to be measured under the SAME throughput level i.e. if i would run 4 times slower the chances that i would hit any GC using regular java GC i,e. without BigMemory extension is significantly lower if not the same.

I do not understand most of what you are saying here. It seems you are implying your product is faster than BigMemory. Let me clarify. This test is not a performance test. It is a "predictability" test. It pushes the JVM by fragmenting intentionally and vigorously. It confirms, beyond doubt for me at least, that BigMemory is delivering the value we set out to deliver--deterministic JVM performance. We are releasing the benchmark today according to Greg. If your system outruns ours, feel free to let me and the world know. I will then, of course, work to speed up. Thus is the value of competition for the user community.

But let's be clear here. You blurred the lines between predictability and performance. BigMemory can run a 350GB JVM w/ no pauses for days, while writing 50% of the time to the cache--predictability. With respect to performance, our new customer recently ran the JBossCacheBenchFramework available at SourceForge. They tested several vendors' solutions. No changes were made to the framework by any vendor. Vendors were only allowed to change their implementations of Manik's interfaces. It was a fair test, IMHO, even though I don't like the benchmark itself. Anyways, we won, producing 50MM reads per second and 9MM writes per second from a 4-node Dell r410 cluster (less than 1/4 the capacity of 4 UCS blades). The nearest alternative to Terracotta, when run with the same transaction-isolation levels (apples-2-apples) produced 9MM reads and 100K writes / second.

Am I saying Terracotta is fastest? No. You and I and everyone reading this know that data management infrastructure like Terracotta can be handed perfect workloads or pathological workloads. One system can find a workload perfect that another finds pathological. But I think we all know that numbers can "lie" and I think its arguably bad faith on your part to take my test's output and your test's output and make a ratio of the 2. Its a nonsensical exercise you have done here.

Anyways, just to be clear, BigMemory's value is not all-out performance--yet it is not slow either. The tiered storage that it transparently uses to manage hottest data in-heap, and the rest of the data in RAM, off-heap is quite new and fast. It uses tools that have existed a long time (since 2001 as Cameron pointed out) but assembles the tools in a unique highly valuable way.

Why is BigMemory valuable? I see it this way: According to Sun Microsystems before their acquisition, 70% of all Java apps have Ehcache in them. BigMemory is a plug-in to Ehcache that requires zero code change. If Ehcache can fit hundreds of gigs of cache data into the JVM and nearly never pause, then we are talking about moving 70% of Java apps closer to realtime. In-ram cache will always outrun the underlying data store whether that store is a web service or a database. And in-ram caching will always outrun a datagrid except in the special case where a datagrid sees perfect locality due to workload balancing--but then a grid is just a cache + HA replication. Why does it outrun a grid? because a grid has to go off-host to get data, even though that get is way faster than going to a DB in pretty much all datagrids. In-memory is fastest...hard to debate.

One last point: modern generational collectors, without BigMemory for Ehcache are thrown 2 competing workloads. Caches contain long-lived data that survive many collections and end up in old generations where they tend to cause full pauses in order to be scanned / defragmented. Meanwhile containers and app logic create lots of short-lived objects that barely survive a request / response cycle. These short-lived objects sit in younggen and never cause a pause in order to be collected. If you take the older generations and keep them small or empty, younggen in the modern JVM can keep up with the garbage generated elsewhere in the app with its own pauseless younggen-collections. What we have done here with BigMemory is not magic. It fixes the GC problem because it allows an app to utilize the fastest aspects of GC without tickling or invoking the slower ones at all.

Thanks,

--Ari

Re: Solving the wrong problem with the wrong solution

Ari,
According to Sun Microsystems before their acquisition, 70% of all Java apps have Ehcache in them

Can we have a source please? :)

Nikita.

Re: Solving the wrong problem with the wrong solution

Ari

Thanks for the detailed response.

In general i would agree with the rule of thumb that you outlined. As you know in my original comment to your post started with the following note:

<blockqoute>
"I rarely tend to comment into other vendors announcements but in this case your post refers to so many numbers that requires more clarification IMO"
</blockqoute>

Your post was an exception for the following reasons:

1. You where referring to other products including anonymous wins etc.
2. You where quoting numbers from my earlier comment on this thread and dismissed it with your own interpretation which btw didn't got an answer yet.

You can't use your blog to reference to other products and shut the door for a response especially if your arguments becomes the basis for the entire discussion.

Personally i tend to trust other in the community to follow that rule of thump without moderating them knowing that if they would choose to comment on my blog they must have had a good reason for that.

As for the performance vs deterministic latency i get your point and i actually agree with it. My point was that if i can handle X ops/sec with let say 10msec GC spikes 1/2X ops/sec would probably yield lower GC spikes. So the method of comparison should measure GC under the SAME ops/sec measurement. Or in other words you may find that by controlling the throughput you may get the same deterministic behavior as with offloading data off heap. That was basically the idea behind some of the real-time VM algorithms.

In general i feel that we lately there is this tendency to jump immediately into certain implementation that can cure the world without going through proper analysis which may yield alternative and potentially better approach to the same problem.

I believe that we all under agreement that GC can be a pain but if the reason for that is tuning complexity then perhaps the solution should be a policy that is geared for large cache rather then hacking the JVM completely. If its the way GC manage Cache scenario then perhaps the solution should be to make the GC tuned for handling caching scenario differently then the way it deals with regular java structures. That's where i believe the discussion should be and not where it is right now i.e. "GC tuning sucks here is a cool solution..."

Ehcache BigMemory Performance Benchmark and Results published on ehcache.or

The Ehcache BigMemory Performance Benchmark and Results are now published on ehcache.org

Greg Luck
Founder and CTO Ehcache, Terracotta

Re: Ehcache BigMemory Performance Benchmark and Results published on ehcach

Here's a link to a blog I wrote with a bit of info and history.

Close

#### by

on

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

37 Discuss

Login to InfoQ to interact with what matters most to you.