BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Memcached surpasses EhCache and Coherence in Java Job Demand

Memcached surpasses EhCache and Coherence in Java Job Demand

Lire ce contenu en français

Bookmarks

Around January 2011, Memcached became the number one caching solution used by Java developers based on job demand trends by Indeed.com. Memcached expanded beyond its LAMP roots and has entered the world at large including Java via the popular Spymemcached Java client.  Memcached seems to have hit a sweet spot of speed, and simplicity. This is in contrast to other distributed cache vendors who have greatly expanded the feature sets of their cache solutions.

Job demand for distributed caching solutions as a whole has increased dramatically since 2007 with Oracle Coherence and Memcached leading the way. Providers such a VMWare Gemfire, and Terracotta's EhCache have also seen a lot of growth. One possible correlation to the rise of distributed cache systems is the high profile use of Memcached by organizations like Facebook, and Twitter.

While Memcached is a simple, focused and fast cache solution, traditional Java cache providers have added a lot more features over that past few years. One possible reason for Memcached growth in the Java world is its simplicity to understand and implement. Memcached is "operationally predictable" says Dustin Sallings, Memcached contributor.

Many caching providers also now classify themselves as data grids which is considered by some as a NoSQL solution. The overall growth could be some overlap in features between data grid products (like Oracle Coherence) and the NoSQL philosophy which has gained a lot of momentum. For example, JBoss Infinispan is the successor to JBoss Cache and has added a lot of data grid features along the lines of Oracle Coherence.

EhCache has also added a lot of features usually associated with a data grid after EhCache was acquired by Terracotta. Also with EhCache founder Greg Luck's recent positioning argument for EhCache to be used in use cases that one might consider NoSQL, and JBoss Hibernate OGM NoSQL solution targeted at Infinispan, one can see the crossover from distributed cache to data grid to NoSQL competitor. JBoss Infinispan, and Oracle Coherence are also listed as NoSQL solutions under the category data grids on http://nosql-database.org/. While these new features could put these cache solutions into the NoSQL market, the complexity might make them harder to understand and implement for use cases that just need a really fast, predictable distributed cache.

Data grid products, which are often used as distributed caches or started out as distributed caches, tend to have features like read through caches, write through caches, transactions, replication, data sharding, elastic support and querying just to name a few. Memcached however is a much simpler model based on a form of data sharding where the hash value of the cache key is used to calculate the server where the data is stored in RAM. Memcached is a large distributed hash map. The brains of the cache are half in the client and half in the server. Memcached itself does not have replication, transactions, or elastic support. You scale Memcached out by adding new servers with more RAM.

However, the Memcached wire protocol gets used by other implementors that do provide features like replication, and elastic support, for example, Amazon ElastiCache, Google App Engine Memcache, Gear6, Memcachedb and Membase. Even JBoss Infinispan supports parts of the Memcached wire protocol as does Caucho Resin which acts like a drop in replacement for Memcached. The Memcached wire protocol has become a de facto standard for caching with clients written for PHP, Python, Java, C#, C, Ruby and many more.

InfoQ caught up with Dustin Sallings, the implementer of Spymemcached the leading Java Memcached client, to get his perspective on the rise of Memcached in the Java world and to get some updates on the Memcached wire protocol and Membase.

InfoQ: What is your role with Memcached, Membase and Spymemcached?

I wrote Spymemcached as I thought it'd be a good fit for some needs I had at my day job. I got involved in the Memcached community shortly after that and then began contributing major features and fixes to memcached itself. At one point I managed to become the largest contributor by commits, but I've been surpassed again by some of the other contributors who are doing amazing work.

Membase came about as a way to solve problems people were attempting to solve with Memcached where Memcached was not a good fit. It implements a pattern that some users expected in a way that requires minimal changes to what they were already doing.

Membase Server is a low latency, high throughput NoSQL database technology. It builds on top of the Memcached and provides a distributed key-value data store and unlike Memcached the data is persisted, i.e., it is not just in RAM. Membase Server is 100% compatible with Memcached wire protocol which means most programming languages already support Membase since most programming languages already have Memcached clients. Membase has elastic support meaning you can add and substract Membase Servers from a Membase cluster, and Membase has the ability to rebalance the data amongst the nodes. It also provides some fault tolerant support via data replication.

In addition to working on Memcached and implementing the Spymemcached client, Dustin Sallings is an employee of Couchbase, and he works on Membase which is a NoSQL solution that builds on top of the Memcached code base and protocol. Couchbase supports Membase and Couchbase Single Server which is based on Apache CouchDB. Also in October this year, Couchbase Server 2.0 is to be released which is suppose to unite the simplicity and reliability of CouchDB, with Memcached speed and Membase elasticity

 

InfoQ: Why do you think object caching usage has skyrocketed from just a few years ago?

I guess that's a bit of a matter of perspective. It's always been pretty big where I've been sitting. But now, getting lots of RAM is cheap and anyone with a remotely interesting idea can have the entire world beating down his door in just a few days. For many applications, hitting a disk is a show stopper. They might as well be pulling backups off of tape.

InfoQ: What role do you think Memcached has had in the increased object caching usage? Why?

Memcached has been the standard object cache for a long time now because it's so simple, performs well, and is operationally predictable.

InfoQ: Job demand wise Memcached is for example the number one open source cache that Java developers use unseating Ehcache and at times even Oracle Coherence. How did Memcached jump out of the LAMP world and grab so much mind share with Java developers?

While RAM is cheap, it's not that cheap. The Memcached allocation strategy has a weakness, but once tuned, it's both rock solid, and very close to optimally efficient both in terms of time and space. More complicated software is going to be less predictable and more resource hungry, which will make for a less effective cache overall.

InfoQ: What in the design of Memcached (architecture and wire protocol) has made Memcached so successful? What additions to the Memcached protocol have you added to Membase?

The Memcached protocol is sufficiently simple that it's implemented everywhere there's a networking stack. It's just one of those things people like to throw out there when they're building stuff. That makes it universally available.

The caching itself favors predictability over features. We're often reminded of the things that it *doesn't* do, and that's where various other products try to compete, but complexity kills. We can process hundreds of thousands of operations per second per node all day every day. For us, any operation that isn't O(1), is a non-starter.

In Membase, we have a few additional features that violate this in specific ways for materializing replication streams and doing a few other scheduled background things. The Membase guarantees are different from those of Memcached so there are a few tradeoffs there.

As far as actual extensions, we've done this in a couple different ways. Firstly, we did a lot of work to make sure the Memcached core itself was modularized and reusable for everyone. We started this work before we dreamt of a Membase. The primary goal there was to minimize forks where people wanted to do slightly different things (e.g. Memcachedb, Repcached, countless others) and would do so by taking all of the codebase and tweaking until they came up with something new. In the creation of Membase, we made sure everything we wanted to do could be done with the public code itself.

The first thing we built using this extensibility was a multi-tenancy shim that we used for powering public cloud Memcached instances. To do this, we needed the ability for an engine to operate other engines as well as for the inferior engines to use the shim APIs provided by the multi-tenancy engine for anything they needed from the core. We've got production installs with thousands of separately addressed memcached "instances" with isolated stats, memory regions and key space running inside a single process. We also needed the ability for an engine itself to provide additional protocol handlers so we could dynamically spin up and tear down these instances at runtime as well as have user authentication events automatically bind the connection to the appropriate memory regions.

In a nutshell, our "extensions" were really about adding extensibility. In some cases, we did want them to be generally available for anyone who wants them and to perhaps be considered official at some point. We did this to a large degree with our "vbucket" concept (not claiming it's original, but we needed it) that we use for oversharding and as a unit of clean handoff during cluster rebalancing. This gives users the ability to actually move active memory from one node to another on a completely hot cluster with no additional client overhead.

We also created the "tap" protocol (something I'd been wanting to do for a very long time) as a general means of observing changes in the server (to "tap" it a la wiretap). We use that as a foundation to implement replication, rebalancing, ETL, backups, etc.

With the increased modularity of server and the protocol, one can imagine that Memcached will show up in even more products and solutions.

InfoQ: I saw on your blog that you implemented a memcached server node in Erlang. How many languages have you implemented the Memcached server nodes in?

I'm not actually sure. I wrote one in go  the day after Google announced the language. I haven't used that itself, but I used part of that work for other things. Mainly test code as well as an implementation of the "tap" protocol.

I also wrote one in Twisted (Python) a while back. I made an S3 backed memcached server for EC2. Not sure if I ever got around to open sourcing that. It was kind of nice.

The original implementation of the Memcached binary protocol was a Python asyncore script I wrote with an accompanying client implementation. I still use both of those to bootstrap pretty much all the other work I do.

InfoQ: Gemfire, Coherence, Ehcache Enterprise, Infinispan and many other object caching systems have queries, transactions, read through caches, write behind caches, data sharding, data replication, etc. Memcached does not have a lot of these features and yet is very successful, why? Is it because it simpler therefore easier to understand and implement?

As I mentioned above, Memcached is constrained to operations we can implement using O(1) algorithms. Everything from set to flush has approximately constant algorithmic complexity. Nothing scales up with number of items. There's no regex, tag, prefix, etc. operations because we can't do them without slowing down everything.

If you want to query, you need to index. If you index, you're a) burning memory b) having multiple things that need to be maintained when data's going in and out. You're also going to deal with lots of lock and cache pressure that affect overall performance. Besides, you have a database for that.

Transactions are similar. You're going to take a hit for any type of transaction support I've seen.

In general, you're right about complexity, though. People like features. Features like RAM, CPU, locks, and bugs. It gathers all of these things and causes you to spend your time on them instead of doing something simple. Then the users have to consider how to get their application to handle some kind of STM transaction and what to do in case of failures and all that.

It's a cache. A failure should affect performance, not contribute to incorrectness of your application.

It's a little cliche, but true: Constraints are liberating.

InfoQ: How would you delineate data grids, NoSQL and object caching?

I sort of read that as hardware, philosophy and software. :)

InfoQ: What are the most important features for a NoSQL implementation?

The ones you need and the ones you think you're going to need really soon. There are so many out there and most of them are really good at what they're designed to do.

For example, when I was working on Membase, I was secretly using CouchDB in one of my side projects because it was the appropriate tool for the job. We looked at CouchDB before starting Membase and decided it was far enough from our goals that we couldn't use the core of what it was good at. If you take me back in time a year and ask me which one is better, I'd have to ask for more details on your project.

There have been times I've recommended products from competition as well where users have just such specific needs that there's already a tool that suits them so well.

The gap may close over time, but you similarly can't ask me what the most important feature of a data structure is. Sure, they all store data, but every one I've used has failed me somewhere it wasn't optimized for. It doesn't become less so as the systems get more complicated.

InfoQ: One common problem I have seen with Memcached usage is organizations becoming so tied to it and taking so much load of the databases that they then become reliant on it in that they can't afford even one Memcached server node to go down? Since Memcache server nodes do not support data replication, what options do developers have?

I had a co-worker who once said, "If there's someone in your organization you can't afford to lose, fire that person before it gets worse." The sentiment there is similar here. Memcached is built to be disposable. If your service won't work if a node goes away, what would you ever do if the power went out? The first step is to keep these systems disposable.

Sometimes, of course, people really just want the flat data store. Some of our biggest installations of Membase used to be Memcached in front of MySQL, where MySQL was really almost a backup. There was an ad-hoc write-through cache with a lot of moving parts. They really didn't ever want to serve something that wasn't out of memory. That's where Membase fits well and can increase reliability and expandability while decreasing complexity. You now just worry about capacity planning. If you've got a fifteen node cluster and your application is so popular that taking a node offline would overwhelm the rest of the cluster, add more nodes. Keep adding nodes until you're comfortable sending a clumsy kid into your machine room with a taser. If killing a node kills your business, either find indestructible computers that are internally redundant and come with their own nuclear reactors, or just spread the load out more. Membase makes the latter easy.

In a sense Membase, and other products that support Memcached wire protocol and provide persistence and replication, is a protection for not using Memcached as it was intended to be used. Membase is also an easy on ramp for Memcached users who want to start using a NoSQL solution.

Membase was announced October 2010, and was developed by Zynga, and NorthScale, and NHN. NorthScale became Membase Inc., which then became Couchbase Inc. after merging with CouchOne Inc. Membase is used by Zynga for its popular social games, namely, Farmville, Mafia Wars, and Cafe World. Membase was optimized for storing web applications data like Farmville's data. These online social games store a lot of data. "It's a mind-boggling amount of data. It's a new sort of data, and it warranted development of a new sort of database management system (Membase)" according to Audrey Watters of ReadWrite Cloud. Zynga was already using Memcached so the transition to Membase was a natural one.

InfoQ: What third party tools and support exists to add memcache failover, to make it elastic?

Elasticity in Memcached has always been just a bit tricky. There are forks that add replication, for example, but the general model has been to just have nodes die and consider their part of the cache to be gone forever, to be rebuilt by the application as it has misses. That model works pretty well.

Adding more Memcached nodes reduces the risk and percentage of hits to the real database even if some nodes go down. 

InfoQ: Can you briefly describe vbuckets?

Vbuckets are virtual containers that live within the physical servers to make larger chunks of the service that can move around.

A management layer assigns a fixed number of vbuckets to all of your physical nodes and communicates that to your clients. Your clients, when storing or retrieving data, assign a given piece of data to a particular vbucket deterministically. The client then goes to the server that it believes holds that vbucket and attempts to perform the requested operation.

We have tools that can atomically move vbuckets from one node to another even under fire. In those cases, the above operation will continue to be performed until the very last few drops of data are squeezed out of the previous node, at which point it begins refusing operations for that vbucket, and the new node will suspend requests until it gets confirmation that it's received all the data.

In practice, we can move gigs of data around while clients are reading and writing full throttle and only the tiniest of delays is introduced at the very end once the client is informed it's gone to the wrong place and figures out where the right place is.

Vbuckets are a way for a key hashed data sharded system like Membase to move data around to other servers when adding server nodes to the cluster. In essence, it gives Membase like implementations some infrastructure to implement elasticity and replication. Of course you would need some form of replication in place to spin down nodes for failover, but not to spin new nodes up. Vbuckets is part of the wire protocol so if the client supports it, the client view of the server topology can change on the vbucket fault that says the data moved to a new node.

Dustin went on to explain vbuckets in essence is really the ability to distinguish "I don't know" from "I can't know." With previous strategies, a machine goes out and clients would ease over to other machines and start using them, getting misses and moving forward. This is fine until there are inconsistencies where the clients are talking to different servers for the same data. With the vbucket model, there is external coordination while having the servers remain completely autonomous. The servers don't know where the right server is for a particular piece of data, but they know that the client has come to the wrong place and needs to reconfigure itself. Memcached, does not need any kind of replication or transfer - just add or remove a server, and the vbuckets wire protocol addition can inform the clients that the data is not present and they can reconfigure themselves. The servers don't know the difference. For Membase, which does not have the same caching semantics, it has to transfer FSM (finite state machine) that makes it easy to reliably keep data actively flowing while atomically moving everything to a new server. It seems vbuckets were originally developed for Membase but are now part of the Memcached wire protocol and any "memcapable 2.0+ API" client can use them as long as the server nodes support it.

InfoQ How many projects/products now use the Memcached wire protocol?

I wouldn't even know how to know that. There are plenty.

InfoQ: Are there any big changes coming in the Memcache wire protocol?

The binary protocol's been stable for a while now, but it's not widely used. In the scripting web world, it's harder to take advantage of some of the things it offers. In Java, I was able to easily double throughput by making use of new binary protocol facilities.

InfoQ: How has the wire protocol evolved over the years?

In the text protocol, things have been added here and there. We've had a few additions that have opened some nice doors, and a few that have been more disappointing.

In the binary protocol, we've made things fairly regular. Some people really like text protocols. For me, I can take a binary protocol and implement the entire thing correctly end-to-end and build on it much faster. I think both parties look at the other with confusion.

In our 1.6-related work, it's easy to build extensions to either.

InfoQ: But have you ever seen the Ehcache versus Memcached performance test? You don't have to comment on this if you don't want to.

I've looked at them. All benchmarks seem to be filled with lies (though often not intentionally). I've rarely seen one where I looked at the code and didn't find obvious flaws.

There was a Memcached vs. Redis benchmark a while back that showed Redis way faster than Memcached. The programs that performed the test were quite different and it basically pointed out that a particular Redis client was faster than a particular Memcached client. These things are unfortunate.

In the case of this Ehcache tests I can find, I see similar things. Something's just not quite right. The one I find in a quick google search shows Memcached taking 10,000 sets in 3.4s. My own tests with the same client have pushed 1,000,000 sets in about 10s. I didn't see code or test scenario details, so I can't tell you what the difference is, but something feels wrong.

Also depending on the use case, the results of the benchmark can vary a lot. Using Memcached in a scenario it was not designed for could impact the benchmark result.

InfoQ: Did anyone contact you about joining the JSR-107? It would seem that a person of your background and stature would be a natural on this particular JSR.

No, I've not been involved in Java committees. There aren't really particular languages I consider my home for any long periods of time. Java builds some fairly well-thought-out specs, though many of them are impossible to efficiently implement. The majority of the JCache specification couldn't be implemented on top of Memcached, for example. My contribution to such a group would be a whole lot of "I don't think this can work well."

The last standards body I was involved in was some fairly smart people who didn't write software asking me, "but is it *possible*?" I took a sort of shameful pride in my ability to make the results of that standard actually work in spite of grotesque complexity.

Dustin commented that he was not bashing Java or JSRs or even standards bodies.

Each cache use case and problem means a particular cache system might fit best. Although Memcached might not be the best choice for all cache use cases, Memcached seems to have hit a sweet spot of speed, simplicity and operational predictability as demonstrated by the demand for it. While many vendors of caching systems are adding more features, Memcached seems to be focused on a lot fewer features. The Memcached wire protocol is supported by vendors who do provide support for replication, persistence and elasticity. The Memcached wire protocol has become a de facto standard supported by various projects, products, and PaaS & IaaS providers.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Distibuted Cache vs Data Grids

    by Kiril Menshikov,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Memcache is simple framework to use in application. Lot of clients select Coherence or GemFire as distributed cache. That looks strange, data grids has another purpose. Suppose both Coherence and GemFire store all data in memory and access to the data is fast. But it's only one part of in-memory data grids. Whole server utilisation will be, when we start using CPU. Parallel data processing can significantly improve quality of service and give good performance. Web developers should change they mind and move all logic to data grid.

  • Why not having both Data Grid and Memcache?

    by Talip Ozturk,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Yes, data grids go beyond caching but there are many cases where a Memcache interface can help. That is why Hazelcast, open source in-memory data grid solution, supports Memcache protocol out of box. Hazelcast users enjoy the data grid features such as querying, transactions, distributed executions, data affinity, events and RESTful access along with Memcache support. I believe there are other data grids solutions with Memcache support.

    Talip Ozturk
    Hazelcast

  • Re: Distibuted Cache vs Data Grids

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for your input. My intention, with the article was more about identifying a trend, and stating that Memcached is a very popular option, and showing how the options/features are different. And explain some possible reasons why Memcached is chosen by some along with some possible pitfalls. I hope I did not imply that Memcached is always a better option. It really depends on the use case, the organization, etc.

  • Re: Why not having both Data Grid and Memcache?

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks Talip. Yes. Hazelcast was mentioned to me by a few folks as something I should look into as well. It does not show up under job trends yet, but there was certainly a lot of buzz about it. The article also mentions that Infinispan uses memcached wire protocol to support Perl, Python, PHP, Ruby et al developers. I am working on an article on datagrids so feel free to drop me a line. :)

  • performance benchmarks

    by Robert McIntosh,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    The funny thing about that EHCache vs Memcached test is it misses one very big important point. Using an in-process cache as a main cache is incredibly fragile, borderline dumb. As soon as you bounce for a new version, update, fix, whatever of your app, your cache is gone. It also wrecks havoc with efficient garbage collection. A more correct test would have been a remote EHCache instance vs. Memcached.

  • Re: performance benchmarks

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    First I am not defending or promoting the benchmark at all. I have a lot of respect for Greg Luck and Dustin Sallings.

    I think it is hard to make blanket statements about Memcached or any number of the data grid solutions. Certain use cases could certainly favor one or the other solution. Depending on the use case, the results of the benchmark can vary a lot.


    Using an in-process cache as a main cache is incredibly fragile, borderline dumb.
    As soon as you bounce for a new version, update, fix, whatever of your app, your cache is gone.


    I think there is more to it than this. Bouncing your app does not mean that your cache is gone unless you configure it that way.
    Having a local cache might not be so "dumb" for some applications.
    Also, for example EhCache and Coherence, can be backed with persistent support and in-memory.
    It is a mere matter of how you configure your cache/data-grid.
    I am sure someone from Coherence or EhCache could explain it better than me.
    Also upgrading your application, is not always easy with Memcached either. It takes some thought.
    Let's say you are storing Java object graphs that are serialized in Memcached.
    Now change a few common root objects.
    What happens? I have experienced it. This is not just a problem with Memcached either.
    Upgrading applications that depend on objects caches is hard if the objects change from one version to another.
    There are things you can do for sure. Cache warming comes to mind.


    It also wrecks havoc with efficient garbage collection.


    This is the case with some in-process caches, but not all. Some cache providers have an in-memory, out of Java VM heap option. It is the speed of same server RAM without the overhead of Java's GC.


    A more correct test would have been a remote EHCache instance vs. Memcached.


    That might have been a better apples to apples test, but...
    Since they have different features, it might not have been a perfect test either.
    I think it might be better just to be aware of the shortfalls and advantages of each approach.
    Then design a custom benchmark that fits your use case and application.

    Doesn't it really depend on the application?

    I do believe: Memcached seems to have hit a sweet spot of speed, simplicity and operational predictability as demonstrated by the demand for it.
    I don't believe: Data-grids and local caches are always dumb.

    I can't believe someone could always say use solution X for every application. If that was the case, Membase would not even exist. You would not need it. It does and some people do need it.

  • Re: Distibuted Cache vs Data Grids

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I think in general the differentiation is simple: Memcached is give-away software. The fact that it took 9 years for it to catch up with Coherence in adoption is pretty funny ;-)

    Peace,

    Cameron Purdy | Oracle
    coherence.oracle.com/

  • Re: Distibuted Cache vs Data Grids

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I am pretty sure that the Java client for memcached did not exist until around 2007and was not solid until 2008. Before that Memcached was mostly used in the LAMP and Ruby world. It might be hard to compete for 10 years if you only existed three to four years ago. Memcached itself only existed since 2003. The first few version were not used outside of the LiveJournal application. I don't think it was widely used until 2006 and then only in the PHP and Perl world (part guess, part remembering something I read thus the word "think") then later Ruby, Python....

    For all intents and purposes Memcached did not exist to the rest of the world outside of LiveJournal until 2007.

    www.google.com/trends?q=memcached&ctab=0...

    Of course, one could argue that you can't compare Memcached to Coherence directly. It is apples and oranges as far as features go.



    Peace,

    Rick Hightower

  • Reply from Greg Luck of Ehcache and Terracotta

    by Greg Luck,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Measuring popularity is hard to do with open source. I tend to think of it in terms of number of deployments. Ehcache is distributed with most frameworks and the Ehcache jar is present in the lib directory in most apps out there. I don't think that is true of the spymemcache jar. In terms of ads so many people have used Ehcache over the years that it is now almost assumed knowledge. On level of search activity I find Google trends is good. Try www.google.com/trends?q=memcache%2C+ehcache&... . It shows memcache searches as roughly twice those of Ehcache with the ratio stable for the past two years. Now that it all of the memcache user community. I think it likely that most of this is LAMP activity rather than Java activity. So I remain very unconvinced by Rick's assertion that memcache has surpassed Ehcache in popularity with Java developers.

    I think of memcache to distributed cache storage what Ehcache used to be to in-process. Very simple, dependable and easy to understand. However I think the article misses the point on simplicity. There is a category of cache I call Java Enterprise Caches. In that category you will find Ehcache, Coherence and Gemfire with others aspiring to join such as Infinispan. The feature set is similar but arguably not simple. But the enterprise users require those features. Examples are search, JTA, optimistic locking, pessimistic locking, scale up as well as scale out (think BigMemory), HA, persistence, multi data centre (WAN operation/replication), event listeners, write-behind... If you are building an app dealing with money you may well need these features. I used to think as Dustin does. But in the past few years I have met hundreds of enterprise prospects and customers and have come to understand their needs. This is also why there is a cluster of features common to us all.

    On to JSR107 it will be useless unless it provides an API for common use cases. It is aimed at enterprise Java using Spring or EE. As a result it supports much of the above. But it has a capabilities API so that an implementation can opt out of certain features. But frameworks and user code can detect those features with CacheManager.isSupported(OptionalFeature feature). I have been factoring in both memcache and open source in-process caches of which there are quite a few. If they advertise they do not support transactions, annotations and store by reference, then it is quite simple to implement JSR107. And if you are a full featured enterprise cache like Ehcache then you will support them all. We have around 15 expert group members and there should be 10+ implementations. I have contacted Dustin and am now discussing with him JSR107 with a view to either getting him to join or at least to ensure the spec is implementable by spymemcache.

    On benchmarks the old Ehcache versus Memcache compare the time taken to resolve a reference in a concurrent Map structure (Ehcache in-process) which is in the order of microseconds with a call across the network to return bytes and then deserialize them. It should be no surprise that Ehcache or any in-process cache is 1000 times faster than Memcache. But how does this work out when only part of the cache is held in process which is the case with Distributed Ehcache where the whole cache is in the Terracotta server array and a hot set only is kept in-process? Well it depends on your access pattern. If you get a nice pareto distribution to your reads then your in-process hot set could easily serve 80% of your requests. A Memcache or Terracotta or any NoSQL lookup is in the order of single digit milliseconds. So our average latency would be .8 * .001 ms + .2 * say 2ms = .401 ms versus the over the network stores which are going to be 2-8ms. So about an order of magnitude faster. In the scenario where there is no hot set then performance will be approximately the same.

    On the questions of categorisation. Are we are a data grid: no we are not. Coherence, Gemfire and Infinispan are. We are a client-server architecture with horizontal scale out of the server tier using consistent hashing which is the same category as memcache. Because we are not a data grid we are not participating in the data grid spec: JSR347. We are both an in-process cache and a distributed cache. We are commercial open source. We support both simple caching and enterprise caching. We are not a NoSQL store but we do support some NoSQL use cases. See my article on this: nosql.mypopescu.com/post/3585092735/ehcache-dis.... BTW, I will be doing a Q&A with InfoQ on the topic of Distributed Caching versus NoSQL in the next few weeks.

  • Re: Reply from Greg Luck of Ehcache and Terracotta

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for taking the time to read the article and review it. Thanks for your detailed responses as well.

    Greg:


    So I remain very unconvinced by Rick's assertion that memcache has surpassed Ehcache in popularity with Java developers.


    My initial response to this was that I did not assert any such thing. Then I went back and read my summary and the title, and realize that I did.
    Ok. This was not the exact assertion that I was trying to make.

    I am going to change the title from:

    Memcached surpasses EhCache and Coherence in Java usage

    to:

    Memcached surpasses EhCache and Coherence in Java Job Demand

    The latter is more accurate, and was my intention.

    What happen was....
    The original title was very long, and I shortened it to fit. When I did this I changed the meaning in an unintentional manner.

    I am also going to change the summary from:


    Around January 2011, Memcached became the number one caching solution used by Java developers. Memcached expanded beyond its LAMP roots. ...


    To:


    Around January 2011, Memcached became the number one caching solution based on Java developer job demand. Memcached expanded beyond its LAMP roots. ...


    I spent most of my time editing the article itself, and did not spend enough time on the summary and title. My intention was not to mislead. Thanks for bringing it to my attention. I think the reader might be able to fill in the blanks after reading the article, but clarification is warranted.

    I can't make this change now because the CMS site is under maintenance, but I will wake up early and change it in the morning before I go to work.

    You said:
    Greg:

    On the questions of categorisation. Are we a data grid? no we are not. Coherence, Gemfire and Infinispan are.


    In the article I wrote:

    Rick:

    EhCache has also added a lot of features usually associated with a data grid after EhCache was acquired by Terracotta.



    I understood that EhCache does not classify itself as a data grid. The point I made was that EhCache, Gemfire, Infinispan and Coherence have a lot more overlap in features with each other than they do with Memcached. Since the article was about Memcached more time was spent explaining its features. Had the article been about data grids, more features with better explanations would have been cited.

    Greg:

    We are not a NoSQL store but we do support some NoSQL use cases. See my article on this: nosql.mypopescu.com/post/3585092735/ehcache-dis.... BTW, I will be doing a Q&A with InfoQ on the topic of Distributed Caching versus NoSQL in the next few weeks.


    Again, it seems clear in this news article that you did not consider EhCache to be a NoSQL solution, but that you thought it could be used in similar use cases.

    Rick:

    Also with EhCache founder Greg Luck's recent positioning argument for EhCache to be used in use cases that one might consider NoSQL,


    I even link to the same article that you do, i.e., your article about Ehcache: Distributed Cache or NoSQL Store?

    Later when I mentioned which cache solutions were also classified as NoSQL, I did not say EhCache was NoSQL.

    Rick:

    JBoss Infinispan, and Oracle Coherence are also listed as NoSQL solutions under the category data grids on nosql-database.org/.



    Your comment has a lot of things to consider. I am glad it is part of this news item.

  • There was an earlier article that talked about caching as well

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Here is a related article that covers memcached, ehcache and JSR-107 from a very different perspective.

    www.infoq.com/news/2011/08/jcache-jsr-lives



    This has caused a demand for caching as a recognized part of the architecture not just a performance patch. Greg compared use cases for NoSQL and cache solutions like Ehcache which in many respects try to solve the same problems of scale, but take different approaches. There is also a sharp increase in use of both Ehcahe and memcached in Java and caching in the industry as a whole ....

    Terracotta, where Greg works, has begun implementing the specification in Ehcache, which will include features like BigMemory, Cache Resource Reservation and will work in standalone mode as well as distributed mode. Ehcache is a well known open source Java cache, especially, if one has a background with using Hibernate.

    Ehcache, like most Java cache implementations, has a local in-memory replication mode which **provides some significant performance advantages over a purely distributed version**. While Memcached stoked interest and demand for caching solutions, faster, standards compliant implementations will likely have a place at the table as well. With a standard API, the argument goes vendors and open source implementations can compete on performance, ease of use, cloud elasticity support, scalability and more.

  • Re: Distibuted Cache vs Data Grids

    by Alex Hornby,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Simplicity is a feature sometimes.

  • Re: Distibuted Cache vs Data Grids

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Well said. If you want a distributed cache, the one that does not have a laundry list of features that you don't plan on using may very well look the best. Concise and focused can win the day. On the other hand, you might need those features later. There are always engineering tradeoffs.

  • memcached is okay as a side cache, but side caches for data are lame.

    by Joseph Ottinger,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    In my (biased) opinion, side caches are a problem any time they're used for live data. For rendered or temporary data, well, they're probably ideal there. This sort of thing comes up all the time.

    See www.enigmastation.com/?p=796.

  • Re: memcached is okay as a side cache, but side caches for data are lame.

    by Richard Hightower,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Your "biased" opinion is might not much different than my opinion. Although, I try to keep my opinion out of the conversation as much as possible. I quoted this before...

    www.infoq.com/news/2011/08/jcache-jsr-lives

    The above is an article that I wrote about JCache.


    This has caused a demand for caching as a recognized part of the architecture not just a performance patch. Greg compared use cases for NoSQL and cache solutions like Ehcache which in many respects try to solve the same problems of scale, but take different approaches. There is also a sharp increase in use of both Ehcahe and memcached in Java and caching in the industry as a whole ....


    Terracotta, where Greg works, has begun implementing the specification in Ehcache, which will include features like BigMemory, Cache Resource Reservation and will work in standalone mode as well as distributed mode. Ehcache is a well known open source Java cache, especially, if one has a background with using Hibernate.


    Ehcache, like most Java cache implementations, has a local in-memory replication mode which **provides some significant performance advantages over a purely distributed version**. While Memcached stoked interest and demand for caching solutions, faster, standards compliant implementations will likely have a place at the table as well. With a standard API, the argument goes vendors and open source implementations can compete on performance, ease of use, cloud elasticity support, scalability and more.


    I am finishing up a piece on Scala, and then I am working on a story on datagrids.

  • Coherence also supports memcache protocol

    by Cameron Purdy,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Here's a step-by-step to using any of the memcached clients with Coherence:
    blogs.oracle.com/OracleCoherence/entry/getting_...

    Peace,

    Cameron | Oracle

  • Interesting post

    by Danielle Felder,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's interesting that you say that. This VP Data Grid Engineering Lead writes in his review of Oracle Coherence, "Other technologies don’t allow the same level of scale and compute capability, while providing rigorous resilience and security underpinnings." You can read the rest of his review here: goo.gl/n7UhTQ.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT