Steve Huffman, co-founder of Reddit, shares the main lessons he learned scaling Reddit from a small web application to a large social website.
Reddit was started by Steve Huffman and Alexis Ohanian in 2005 and consisted of running the web application, the application server and the database, all on one machine. Over the time, Reddit has grown to 7.5M users/month and 270M page views/month. Huffman told in a presentation what he learned along the way and talked about many mistakes they made and how they corrected them.
1. Crash Often
They had lots of crashes in the beginning and Huffman used to sleep with his laptop next to him, waking every few hours to check on the website to see if it is up and running. The solution found was to use Supervise, a daemon which restarts crashed applications. That lead to an interesting approach to running an application and on its design: if the app has memory leaks or consumes too much memory, they simply killed it and restarted it from scratch. This was not a definitive solution, but a temporary one until the application would be fixed based on what was inside the log files.
2. Separation of Services
Huffman suggests grouping similar processes together on the same machine or group of machines to avoid context switching all the time which consumes resources. He also considers as a good practice to use a database machine for similar types of data in order to avoid swapping the indices cache all the time and move other types to other machines.
Huffman strongly suggests avoiding threads which in Python are “the kiss of death, a recipe for slowness”. If various tasks are assigned to separate processes rather than threads, it is easy to move those processes to different machines when the demand raises and more resources are needed. The only problem with that is inter-process communication, but otherwise it is better because the architecture can grow smoother.
3. Open Schema
Every new feature required a schema update which became more painful as the DB grew. Adding a new column to a 10 million rows table can take a long time especially if you have a backup copy and a replication one. They had days when they ran without having a backup because they built the replica in that time.
The solution was to use an open schema or an entity-attribute value, a key-value store. Now there are two tables for each data type:
Things can be Users, Links, Comments, etc. all sharing the same schema. The Data table is comprised of a huge number of rows but only 3 columns: an ID, a Key and a Value. The result of this new schema was that adding new features does not involve changing the schema and does not require creating new tables. Also, there are no more joins in the DB, so the DB can be easily distributed.
4. Keep it Stateless
One goal for any web application is for any of its app server to be able to handle any request. That is very easy to obtain when you have only one machine (obviously), but it may get complicated using multiple servers and cached application state. Each server added increased cache redundancy and difficulty in accessing cached data.
The solution in this case was to switch to memcache and to stop using state on any app server they had. One of the immediate benefits was the fact that a crashed application server would not affect any other server. Also, scaling is done easily, simply by adding more servers.
It is important to separate the cache server from other servers to avoid resource contention.
5. Memcache for Everything
Reddit started to use memcache for about everything: database data, session data, rendered pages, memorizing internal functions, pre-computed pages, global locking. They also used memcachedb for data persistence.
6. Store Redundant Data
A recipe for slowness is to “keep data normalized until you need it.” When the user needs some data presented in a specific format, taking the raw data and processing it on spot may delay the response to the point that the user gives up waiting on it. The solution is to keep in memory and disk all the formats used to display the data. While this approach has some disk and memory impact, it has the advantage of a quick response to the user’s request.
For Reddit, the key for speed is “pre-compute everything and dump it on memcache.”
7. Work Offline
When an user makes a request, the system needs to do what is absolutely necessary to return a proper response, and everything else is derogated to queued jobs that perform things offline. Examples of works done offline include: pre-computing listings, fetching thumbnails, detecting cheating, removing spam, computing awards, and updating the search index. When an user votes on a link, he does not need to wait until all the indexes and listings are updated, and those tasks can be done after a response is returned to the user.
The blue arrows in the following picture represent activities performed on user’s request, while the pink arrows are activities performed offline: