InfoQ

News

FriendFeed Implements Schema-less Storage Atop MySQL

Posted by Dave West on Mar 25, 2009

Community
Architecture
Topics
Persistence ,
Database Design ,
Performance & Scalability
Tags
MySQL ,
Relational Databases ,
Database Management ,
Casestudy

Bret Taylor of FriendFeed describes a "schemaless solution" to the problem of "storing data with flexible schemas and building new indexes on the fly," for a rapid growth website. The problem itself comes from the need to constantly add new features and update the underlying database structure, and the millions of existing records stored in that database, to support both the old and new features. FriendFeed's solution was to build "a schemaless solution on top of MySQL instead of moving to a different technology base. Bret describes the basic problem:

In particular, making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every INSERT, pushing important blocks out of memory. There are complex operational procedures you can do to circumvent these problems (like setting up the new index on a slave, and then swapping the slave and the master), but those procedures are so error prone and heavyweight, they implicitly discouraged our adding features that would require schema/index changes. Since our databases are all heavily sharded, the relational features of MySQL like JOIN have never been useful to us, so we decided to look outside of the realm of RDBMS.

After reviewing several possible solutions, they decided to use a custom "schema-less" solution on top of MySQL instead of opting for a completely different persistence solution.

Their solution was to separate the primary data and indices to that data. "Our datastore stores schema-less bags of properties ... We index data in these entities by storing indexes in separate MySQL tables." This resulted in a trade-off between volume and efficiency.

As a result, we end up having more tables than we had before, but adding and removing indexes is easy. We have heavily optimized the process that populates new indexes (which we call "The Cleaner") so that it fills new indexes rapidly without disrupting the site.

Separating the data and indexes raised issues of consistency and atomicity. Instead of a strict transaction protocol, they made the database tables canonical and used the indexes only for references and reapplied filters to the actual database reads. They enhanced the automated process that keeps tables updated, called 'the Cleaner' to continuously update and fix indexes with updated entities having priority. Inconsistencies, although possible, were cleaned up in less than two seconds on average.

Using Average Page Latency as a metric, Bret reported the following trends.

  • Overall -- a significant decrease in the face of a rising trend.
  • Last 24 hours -- stable even during peak hours.
  • Prior week -- "dramatic" drop.

Bret's post generated a lot of feedback. One theme, "modern RDBMS do not share the limitations of MySQL when it comes to schema evolution." which ignores the cost issue behind the original choice. Other readers responded with alternate solutions.

It is interesting that no one pointed out the similarity between FriendFeed's solution and the "ancient" technology of ISAM - Indexed Sequential Access Method. ISAM used the same basic architecture - separating data and indices with automatic updates of the indices when data was changed.

it begs the question by Ivan Lazarte Posted Mar 25, 2009 10:46 PM
Re: it begs the question by Florin T.PATRASCU Posted Mar 27, 2009 8:30 AM
  1. Back to top

    it begs the question

    Mar 25, 2009 10:46 PM by Ivan Lazarte

    why even use a database for their particular storage needs? In fact, why not just use object serialization...

  2. Back to top

    Re: it begs the question

    Mar 27, 2009 8:30 AM by Florin T.PATRASCU

    @Ivan; good point.

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.