New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Abel Avram on Mar 20, 2010
Both Digg and Reddit have announced their move to Cassandra this month because MySQL does not scale well enough for them. Some consider that MySQL + memcache is no longer the de facto scalability solution.
Digg announced their plans to move to Cassandra in September last year, a process that was completed this month. After looking at other projects - HBase, Hypertable, Tokyo Cabinet/Tyrant, Voldemort, and Dynomite -, the team settled with Cassandra because:
Each system has its own strengths and weaknesses, but Cassandra has a good blend of everything. It offers column-oriented data storage, so you have a bit more structure than plain key/value stores. It operates in a distributed, highly available, peer-to-peer cluster. While it’s currently lacking some core features, it gets us closer to where we want to be than the other solutions.
Digg has rebuilt the entire infrastructure moving away from the LAMP stack. The main culprit was MySQL because, as any other SQL database, it is optimized for reads and cannot handle writes properly:
Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead. …
As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.
Another website, Reddit, used to have problems with memcacheDB, and they initially addressed them by adding more RAM, but it was clear they needed a long-term solution. They completed the transition to Cassandra in 10 days using one developer with “the help of the amazing Cassandra developers and community and EC2, which allowed us to bring up new instances on which to test and ultimately deploy Cassandra”.
Since many important websites, like Facebook or Twitter, are already using or planning to move to Cassandra, some have announced the end of the MySQL + memcached era as the de facto scalability solution. Todd Hoff does not think MySQL will disappear any time soon, but it is not going to represent the first solution:
With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era. …
It's clear that many of the ideas behind MySQL+memcached were on the mark, we see them preserved in the new systems, it's just that the implementation was a bit clunky. Developers have moved in, filled the gaps, sanded the corners, and made a new sturdy platform which will itself form the basis for a new ecosystem and a new era.
Commenting on Hoff’s remark “it's clear the MySQL+memcached era is passing”, Mark Atwood disagreed with him, considering that memcached is still going to be used for a long time:
The era of memcached being THE cutting edge technique for getting speed at scale may be "ending", but not because memcached is failing, but because there are additional (not replacement, additional) techniques now emerging. …
But that won't be the end of memcached. The technique of the high-performance key-value store is just to useful of a building block, both on it's own, and as a sub-component of other technology components, to just throw out.
I'm sure that memcache will continue to evolve. There will be more implementations, there will be limitations removed, there will be more management tools, there will be other systems that add the memcached network protocol, there will be ORMs and other frameworks that will build in the assumption that memcached is available, there will be features to the protocol and implementations for shared hosting and cloud environments.
Hoff added later in a comment to his post: “I wasn't trying to say caching will go away or that MySQL will go away. I'm a big believer in the whole memory is the new disk proposition. … What has passed is MySQL and memcached, which complement each other so well, as the default platform on which to develop scalable systems.”
While MySQL and memcache are still going to be a good solution for scalability issues, other non-SQL solutions are created, solutions that seem to offer better results for very large systems.
Monitor your Production Java App - includes JMX! Low Overhead - Free download
Mobile and the New Two-Tiered Web Architecture
Why NoSQL? A primer on Managing the Transition from RDBMS to NoSQL
18 agile and lean practices for effective software development governance
*sigh*
I wish the NoSQL "movement" had picked a better name. When will developers stop conflating the logical with the physical?
Okay, how about "LSD" ... Lightweight Structured Data ? (or Logically Structured Data) ,.. or Loosely Structured Data ... (take your pick)
You will of course note the nice play on ACID ;) Pun intended.
I second the *sigh*
One of the major problems I've got with many trends (mostly Agile, but really not just), is the need to get in your face. How about instead of giving SQL a negative connotation, perhaps the movement could be re-framed in a positive light, like stay-object oriented, or something snappy to that effect.
Well, depends on the root cause of your dislike of your current database situation.
But, in Digg's case, it seems like they should join the NoMySQLInEdgeCases movement. :)
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
4 comments
Watch Thread Reply