InfoQ

News

Architecting Twitter

Posted by Steven Robbins on Jun 12, 2008

Community
Architecture,
Ruby
Topics
Performance & Scalability ,
Database Design ,
Web 2.0
The architecture underlying the very popular social application Twitter has been at the center of several discussions lately. Twitter had several instances of downtime and had turned off several popular features as the team tried to deal with the issues. What can be learned from looking at how Twitter tries to move forward?

Several people, including Om Malik and Dare Obasanjo, speculated about the underlying architecture of Twitter that led to the problems. More recently, Robert Scoble interviewed Twitter's Evan Williams and Biz Stone about things behind the scenes with the application and the company's future. The entire streaming video of the interview can be found on qik.

In the interview, Williams and Stone answered on of the big questions regarding Twitter's data architecture: Is Twitter using a Single Instance Storage (SIS) type of approach to user messages? At around the 13 minute mark in the interview, Williams talked about message storage and user timeline retrieval:
It doesn't do that [make a copy of the message for every user's follower], but that actually might be more efficient. Right now it goes into a database and when people want to get their timeline we construct the timeline out of the database and then, not every time, we then cache it in memory. But because things are written so often we are going to the database a lot just to update the cache. So there are lots of copies [of a message] in the cache but there's only one on disk. Our future architecture may be more like we're writing it many times because reading it will be a lot faster that way.
The possibility of moving away from an SIS message architecture opens the door to using data techniques like Data Sharding that are already popular with many high-volume sites and applications. Randy Shoup talked about ways that eBay architected their systems for high scalability, in part, by using sharding:
The more challenging problem arises at the database tier, since data is stateful by definition. Here we split (or "shard") the data horizontally along its primary access path. User data, for example, is currently divided over 20 hosts, with each host containing 1/20 of the users. As our numbers of users grow, and as the data we store for each user grows, we add more hosts, and subdivide the users further. Again, we use the same approach for items, for purchases, for accounts, etc. Different use cases use different schemes for partitioning the data
Bogdan Nicolau wrote an overview on the basics of database sharding. In the series, Bogdan discussed how to decide where and how to divide the data for an application. The main point in deciding was
What I’m trying to say is that no matter the logic you chose to split a table, always keep in mind that you want 0 join, order by or limit clauses which would require more than one table shards.
Bogdan moved on to the application side of using shards. Along with providing several code samples to go along with an example problem, Bogdan gave reasons for why they should work:
As you can see, the weight now sits in the writing part, as the mapping table must be populated. When reading, the splitting of the data algorithm involved is no longer a concern.
With several people involved in the discussions around how to scale Web 2.0, perhaps Twitter will continue to move towards a more stable, scalable architecture.

InfoQ has many resources on performance and scalability. Take a look at them here.
Shameless plug by Bill Burke Posted Jun 13, 2008 11:44 AM
Wrong direction by Jason Carreira Posted Jun 13, 2008 1:52 PM
  1. Back to top

    Shameless plug

    Jun 13, 2008 11:44 AM by Bill Burke

    Some things to help with shards:


    Hibernate Shards


    Metamatrix Incredible at federating and mapping data.

  2. Back to top

    Wrong direction

    Jun 13, 2008 1:52 PM by Jason Carreira

    Any solution that includes writing to a database as part of the critical path of adding a status is asking for problems. I've actually been working on a micro-blogging app and it's amazing how poorly architected for this purpose Twitter has been. Messaging systems aren't new or novel, there's no reason to re-invent the wheel, only making it square this time.

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.