Digg and Reddit Have Joined the NoSQL Camp
Both Digg and Reddit have announced their move to Cassandra this month because MySQL does not scale well enough for them. Some consider that MySQL + memcache is no longer the de facto scalability solution.
Digg announced their plans to move to Cassandra in September last year, a process that was completed this month. After looking at other projects - HBase, Hypertable, Tokyo Cabinet/Tyrant, Voldemort, and Dynomite -, the team settled with Cassandra because:
Each system has its own strengths and weaknesses, but Cassandra has a good blend of everything. It offers column-oriented data storage, so you have a bit more structure than plain key/value stores. It operates in a distributed, highly available, peer-to-peer cluster. While it’s currently lacking some core features, it gets us closer to where we want to be than the other solutions.
Digg has rebuilt the entire infrastructure moving away from the LAMP stack. The main culprit was MySQL because, as any other SQL database, it is optimized for reads and cannot handle writes properly:
Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead. …
As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.
Another website, Reddit, used to have problems with memcacheDB, and they initially addressed them by adding more RAM, but it was clear they needed a long-term solution. They completed the transition to Cassandra in 10 days using one developer with “the help of the amazing Cassandra developers and community and EC2, which allowed us to bring up new instances on which to test and ultimately deploy Cassandra”.
Since many important websites, like Facebook or Twitter, are already using or planning to move to Cassandra, some have announced the end of the MySQL + memcached era as the de facto scalability solution. Todd Hoff does not think MySQL will disappear any time soon, but it is not going to represent the first solution:
With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era. …
It's clear that many of the ideas behind MySQL+memcached were on the mark, we see them preserved in the new systems, it's just that the implementation was a bit clunky. Developers have moved in, filled the gaps, sanded the corners, and made a new sturdy platform which will itself form the basis for a new ecosystem and a new era.
Commenting on Hoff’s remark “it's clear the MySQL+memcached era is passing”, Mark Atwood disagreed with him, considering that memcached is still going to be used for a long time:
The era of memcached being THE cutting edge technique for getting speed at scale may be "ending", but not because memcached is failing, but because there are additional (not replacement, additional) techniques now emerging. …
But that won't be the end of memcached. The technique of the high-performance key-value store is just to useful of a building block, both on it's own, and as a sub-component of other technology components, to just throw out.
I'm sure that memcache will continue to evolve. There will be more implementations, there will be limitations removed, there will be more management tools, there will be other systems that add the memcached network protocol, there will be ORMs and other frameworks that will build in the assumption that memcached is available, there will be features to the protocol and implementations for shared hosting and cloud environments.
Hoff added later in a comment to his post: “I wasn't trying to say caching will go away or that MySQL will go away. I'm a big believer in the whole memory is the new disk proposition. … What has passed is MySQL and memcached, which complement each other so well, as the default platform on which to develop scalable systems.”
While MySQL and memcache are still going to be a good solution for scalability issues, other non-SQL solutions are created, solutions that seem to offer better results for very large systems.
What's In A Name?
I wish the NoSQL "movement" had picked a better name. When will developers stop conflating the logical with the physical?
Re: What's In A Name?
You will of course note the nice play on ACID ;) Pun intended.
Re: What's In A Name?
One of the major problems I've got with many trends (mostly Agile, but really not just), is the need to get in your face. How about instead of giving SQL a negative connotation, perhaps the movement could be re-framed in a positive light, like stay-object oriented, or something snappy to that effect.