New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Srini Penchikala on Oct 06, 2008
Memcached is a distributed memory object caching system used in dynamic web applications to alleviate database load. It is used to speed up dynamic database-driven websites by caching data and objects in memory to reduce the number of times the database must be read. Memcached is based on a hashmap storing key/value pairs. The daemon is written in C, but clients can be written in any language and talk to the daemon via the memcached protocol. But it does not provide any redundancy (e.g. via replication of its hashmap entries); when a server S is stopped or crashes, all key/value pairs hosted by S are lost.
Bela Ban, JGroups and Clustering Team Lead at JBoss, recently wrote a JGroups-based implementation of memcached which allows Java clients to access memcached directly. The implementation is written completely in Java and has few advantages over memcached framework:
The main class to start the JGroups memcached implementation is org.jgroups.demos.MemcachedServer. It creates an L1 cache (if configured), a L2 cache (that's the default hashmap storing all entries), and a MemcachedConnector. The API is very simple and includes the following caching methods:
InfoQ spoke with Bela Ban about the motivation behind JGroups implementation of memcached. He said that JGroups implementation of memcached allows them to experiment with a distributed cache and see how the various caching strategies fit into JBoss Clustering. He also explained how this new memcached implementation compare with JBossCache caching framework:
We see caching as a continuum between distributing data across multiple nodes (hosts) in a cluster (without redundancy) and fully replicating data (total replication of every data item to every cluster node). Between distribution and total replication, we have buddy replication, which replicates data to a few selected backup nodes. This can be compared to RAID, where RAID 0 has no redundancy (distribution), RAID 0+1 has fully redundancy and RAID 5 has partial redundancy.
Currently, the PartitionedHashMap in JGroups provides distribution, and JBossCache provides total replication and partial replication (with Buddy Replication). The idea is to let the user define K *per data item* they place into the cluster, so K=0 means distribution, but if a node which hosts one or more stripes, crashes then the data is gone, to K=X (where X < N) which is RAID 5, to K=N which is total replication.
The memcached implementation in JGroups is a first step to experiment with K=0, which is pure data distribution without redundancy. This will eventually make it into JBossCache.
Where does memcached implementation fit in JBoss Application Server modules?
It will be part of the Clustering subsystem, provided by JBossCache. Note that our implementation is really written with "Java" clients in mind, so we don't have to use that terribly inefficient memcached protocol, with the marshalling/unmarshalling/copying overhead.
Talking about the typical use cases for using JGroups implementation of memcached, Bela said:
The server side code (e.g. servlets) running in a JBoss or Tomcat cluster, which accesses a DB and needs a cache to speed up things and remove a DB bottleneck. The other use case is similar, but instead of accessing a DB, access is to the file system. For example, an HTML page caching server (Squid comes to mind).
Are there any plans to introduce memcached into JBoss Application Server in the future.
Absolutely. The Data Partitioning feature will allow users to configure caching according to their needs. So having something like a distributed cache is not a new feature in itself, but a matter of JBossCache configuration. The cool thing is that this can be dynamic, so developers can decide which redundancy features (none=distribution, full=total replication or partial) they want per data item they put into JBossCache.
Regarding the future direction of the project in terms of new features, Bela listed the things are on the todo list:
JGroups implementation of memcached and its library dependencies can be downloaded on their sourceforge website. Below is the command to launch the program:
java -jar memcached-jgroups.jar
Bela is looking for the feedback from the community. He said this is an experimental feature, but will become a supported feature of JBossCache, and community input will have a great influence on the direction of this feature.
Srini Penchikala currently works as Security Architect and has 17 yrs of experience in software product management.
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
18 agile and lean practices for effective software development governance
Monitor your Production Java App - includes JMX! Low Overhead - Free download
I'm a little unclear what this has to do with memcached actually. I think it would be interesting to see support for the terribly inefficient memcached protocol
that the rest of the world seems to like so much. If the ascii protocol is so terrible and inefficient, than maybe the new binary protocol? Then you could open this thing up to non-java clients. Then you've got some real comparison to memcached and its versatility. That; would be interesting.
My worry of the month lately is how Java handles multi-gig heaps when dealing with cache like semantics (i.e. Lots of long lived objects, lots of short lived objects living long enough to get out of eden due to TTL or expiration, only to die)
One of the real selling points of the C memcached is the simple and robust slab memory allocator. Does anybody have any info on how Java's GC compares. I could imagine some cases in which it could do better, since it doesn't have to actually grab and free, but in general, I worry about triggering stop the worlds constantly due to cache churn.
I agree 100%. I know Danga had started an effort to provide a binary protocol, but so far no results.
Yes, the slab allocator is certainly a prominent feature of memcached. Coincidentally, we had this discussion on the JGroups dev mailing list some weeks ago too. I copied the relevant section below:
Correct, but that's a feature of Java versus C in general, and not
PartitionedHashMap in particular.
memcached uses something similar to a buddy memory allocation scheme
([1]), which is great, but they need to make sure they don't waste
memory. For instance, if you always allocate pages of 500 bytes, then
this mechanism is not the best, because the smaller pages won't get
used, and the larger pages are wasted, unless they get fragmented into
smaller ones.
I'll take the stance that, unless you know exactly what the avg size of
your app's memory requirements is, the OS does a better job at
allocating memory and in addition you'll benefit from future
improvements in the mem allocator code of your OS.
memcached probably shines when you know exactly what the memory pages
sizes are and you change the src to accommodate that.
I'd also claim that even with GC, this is very useful, because the few
GC cycles are a good tradeoff against having to go to the DB.
Note that we could implement something like memcached's memory
allocator, too: grab direct memory (ByteBuffer.allocateDirect() /
MappedByteBuffer), divide it up into lists of fixed sizes (buddy pools)
and then use those buffers to store data. Direct memory is allocated
outside of the Java heap, so it will never get garbage collected, but
TBH I'm not sure this is a good idea. I mean, we're coming back to Java
versus C here. There's a reason I switched and a big part is garbage
collection and the avoidance of dangling pointers.
I've attached the doc describing memcached's memory allocation strategy.
[1] en.wikipedia.org/wiki/Buddy_memory_allocation
Hanson Char wrote:
> One of the major benefits of using the native memcached is that unlike
> a JVM, GC (full GC in particular) can be entirely avoided. Wouldn't
> that benefit be lost if a memcached impl is done entirely in Java ?
I'm sure the JVM guys have explored plain slab allocation before. The trouble is how do you handle fragmentation in a multi-threaded environment without compactions? That's probably why there are some many GC settings to accommodate different application requirements.
A few months ago, I had written a small blog entry related to allocators - javaforu.blogspot.com/2008/05/memory-dont-forge...
You might also find this interesting - blogs.sun.com/jonthecollector/entry/our_collectors.
JVMs also have Thread level allocation buffers (TLAB) - conceptually similar to Arena allocators found in Google's TCMalloc and other such Malloc alternatives.
Ashwin.
I wonder whether java based memcached implementation will perform better than the C based one if the java client runs in a different JVM on a different node? Did somebody test the java implementation in this way?
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
6 comments
Watch Thread Reply