New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Jonathan Allen on Jul 12, 2007
Instead of ASP.Net's built-in caching, some .NET developers are turning to memcached, is a distributed memory caching system originally by Danga Interactive for LiveJournal.
A fundamental problem with caching is stale data. When running a single web server, one can easily clear a cache when data is known to have been changed. Unfortunately, ASP.NET doesn't have a good way to scale this up multiple servers. Each server's cache is blissfully unaware of changes to other caches.
ASP.NET does allow triggers based on changes to the file system or a database table to invalidate a cache. However these have problems, such as the expensive polling that database triggers cause and the tedious wiring of the triggers themselves. There are other options however.
Unlike ASP.NET's built-in caching, memcached is a distributed cache. Any web server in can update or delete a cache entry and all the other servers automatically see the change next time they access the cache. This is done by storing the entries on one or more cache servers. Each entry assigned to a server based on a hash of its key.
Superficially, the ASP.NET API for memcached looks almost identical to the built-in API. This allows switching to memcached to be as easy as a single pass with search and replace.
Moving beyond just getting it running however, there are some questions as its proper use in larger web farms. Richard Jones writes
As we add more nodes, the usefulness of get_multi decreases - it's possible for a single page to hit almost all of the memcached instances. I read somewhere that facebook partition their memcached cluster to improve get_multi performance (eg, all user data on a subset of mc nodes). Can anyone comment on the effectiveness of this?
One proposed solution is to generate hash keys separately from the cache entire's key. This would allow a developer to ensure that all entries needed by a given page are more likely to be on the same server. Unfortunately, generating hash keys based on where one wants the data stored rather than from the cache key itself is likely to be error prone and will take careful implementation.
You can download memcached under a BSD license. Client APIs for C#, as well as Perl, Python, PHP, Java, and a host of other languages have to installed separately. Finally, there is a Win32 port for those not wanting to run Linux machines.
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
Using Drools? See what you're missing! Get the Power of Drools with the Assurance of Red Hat
Fair Trade Software Licensing - A Guide to Neo4j Licensing Options
I am actually doing some work on a side project that could very much use this! Thanks for the story.
You can find similar and much more enhanced implementation of caching solution A.K.A DataGrid through GigaSpaces
It comes with built-in partitioning, SLA driven container, support for Spring, SQL query support etc. A totally free community edition is also available which support basic hub/spoke clustering, persistencey etc.
And there is a .net implementation available as well.
HTH
Nati S.
GigaSpaces write once scale anywhere
We've been seeing a good number of migrations recently from Memcached to Coherence lately. In addition to the support model, the main reason seems to be the issues with the "opaqueness" of using Memcached, plus necessary features such as reliable write-behind, which are necessary for high performance database integration.
Peace,
Cameron Purdy
Oracle Coherence: Data Grid
If you are looking for an open source alternative, CSQL Cache is the answer.
It support bi-directional, updateable, real time table caching for applications. Check out this blog for its functionalites, csqlcache.wordpress.com
Project Page:
sourceforge.net/projects/csql
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
4 comments
Watch Thread Reply