InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Digg and Reddit Have Joined the NoSQL Camp

Posted by Abel Avram on Mar 20, 2010

Sections
Architecture & Design
Topics
Performance & Scalability ,
Architecture

Both Digg and Reddit have announced their move to Cassandra this month because MySQL does not scale well enough for them. Some consider that MySQL + memcache is no longer the de facto scalability solution.

Digg announced their plans to move to Cassandra in September last year, a process that was completed this month. After looking at other projects - HBase, Hypertable, Tokyo Cabinet/Tyrant, Voldemort, and Dynomite -, the team settled with Cassandra because:

Each system has its own strengths and weaknesses, but Cassandra has a good blend of everything. It offers column-oriented data storage, so you have a bit more structure than plain key/value stores. It operates in a distributed, highly available, peer-to-peer cluster. While it’s currently lacking some core features, it gets us closer to where we want to be than the other solutions.

Digg has rebuilt the entire infrastructure moving away from the LAMP stack. The main culprit was MySQL because, as any other SQL database, it is optimized for reads and cannot handle writes properly:

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead. …

As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.

Another website, Reddit, used to have problems with memcacheDB, and they initially addressed them by adding more RAM, but it was clear they needed a long-term solution. They completed the transition to Cassandra in 10 days using one developer with “the help of the amazing Cassandra developers and community and EC2, which allowed us to bring up new instances on which to test and ultimately deploy Cassandra”.

Since many important websites, like Facebook or Twitter, are already using or planning to move to Cassandra, some have announced the end of the MySQL + memcached era as the de facto scalability solution. Todd Hoff does not think MySQL will disappear any time soon, but it is not going to represent the first solution:

With a little perspective, it's clear the MySQL+memcached era is passing. It will stick around for a while. Old technologies seldom fade away completely. Some still ride horses. Some still use CDs. And the Internet will not completely replace that archaic electro-magnetic broadcast technology called TV, but the majority will move on into a new era. …

It's clear that many of the ideas behind MySQL+memcached were on the mark, we see them preserved in the new systems, it's just that the implementation was a bit clunky. Developers have moved in, filled the gaps, sanded the corners, and made a new sturdy platform which will itself form the basis for a new ecosystem and a new era.

Commenting on Hoff’s remark “it's clear the MySQL+memcached era is passing”, Mark Atwood disagreed with him, considering that memcached is still going to be used for a long time:

The era of memcached being THE cutting edge technique for getting speed at scale may be "ending", but not because memcached is failing, but because there are additional (not replacement, additional) techniques now emerging. …

But that won't be the end of memcached. The technique of the high-performance key-value store is just to useful of a building block, both on it's own, and as a sub-component of other technology components, to just throw out.

I'm sure that memcache will continue to evolve. There will be more implementations, there will be limitations removed, there will be more management tools, there will be other systems that add the memcached network protocol, there will be ORMs and other frameworks that will build in the assumption that memcached is available, there will be features to the protocol and implementations for shared hosting and cloud environments.

Hoff added later in a comment to his post: “I wasn't trying to say caching will go away or that MySQL will go away. I'm a big believer in the whole memory is the new disk proposition. … What has passed is MySQL and memcached, which complement each other so well, as the default platform on which to develop scalable systems.”

While MySQL and memcache are still going to be a good solution for scalability issues, other non-SQL solutions are created, solutions that seem to offer better results for very large systems.

What's In A Name? by Paul Tiseo Posted
Re: What's In A Name? by David Peterson Posted
Re: What's In A Name? by Paul Tiseo Posted
Re: What's In A Name? by Assaf Stone Posted
  1. Back to top

    What's In A Name?

    by Paul Tiseo

    *sigh*

    I wish the NoSQL "movement" had picked a better name. When will developers stop conflating the logical with the physical?

  2. Back to top

    Re: What's In A Name?

    by David Peterson

    Okay, how about "LSD" ... Lightweight Structured Data ? (or Logically Structured Data) ,.. or Loosely Structured Data ... (take your pick)

    You will of course note the nice play on ACID ;) Pun intended.

  3. Back to top

    Re: What's In A Name?

    by Assaf Stone

    I second the *sigh*

    One of the major problems I've got with many trends (mostly Agile, but really not just), is the need to get in your face. How about instead of giving SQL a negative connotation, perhaps the movement could be re-framed in a positive light, like stay-object oriented, or something snappy to that effect.

  4. Back to top

    Re: What's In A Name?

    by Paul Tiseo

    Well, depends on the root cause of your dislike of your current database situation.

    But, in Digg's case, it seems like they should join the NoMySQLInEdgeCases movement. :)

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.