InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Facebook Chat Architecture

Posted by Gavin Terrill on May 16, 2008

Sections
Architecture & Design
Topics
Performance & Scalability ,
Architecture
Tags
Erlang ,
C++

On the Facebook engineering blog, Software Engineer Eugene Letuchy recently posted details of the engineering decisions behind Facebook Chat:

when your feature's userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start

Eugene identified a number of challenges for that size user base, starting with presence notification:

The naive implementation of sending a notification to all friends whenever a user comes online or goes offline has a worst case cost of O(average friendlist size * peak users * churn rate) messages/second, where churn rate is the frequency with which users come online and go offline, in events/second. This is wildly inefficient to the point of being untenable, given that the average number of friends per user is measured in the hundreds, and the number of concurrent users during peak site usage is on the order of several millions.

Another challenge was delivering messages in real time. Facebook choose a technique whereby the client pulls updates from the server, similar to Comet's XHR Long Polling Process:

The method we chose to get text from one user to another involves loading an iframe on each Facebook page, and having that iframe's Javascript make an HTTP GET request over a persistent connection that doesn't return until the server has data for the client. 

Eugene goes on to mention that "Having a large-number of long-running concurrent requests makes the Apache part of the standard LAMP stack a dubious implementation choice".

Facebook choose a combination of C++ and Erlang to implement clustered and partitioned subsystems. The C++ module is used to log chat messages, while Erlang "holds online users' conversations in-memory and serves the long-polled HTTP requests". epoll, a new system call introduced in Linux 2.6, was used to drive the Erlang module. Eugene states why the decision was made to go with Erlang:

In short, because the problem domain fits Erlang like a glove. Erlang is a functional concurrency-oriented language with extremely low-weight user-space "processes", share-nothing message-passing semantics, built-in distribution, and a "crash and recover" philosophy proven by two decades of deployment on large soft-realtime production systems.

Thrift, the open source framework (released by Facebook on April fool's day last year) for "scalable cross-language services development", was used to tie together the various technologies used in Facebook Chat, and now features bindings for Erlang.

An interesting approach was used to roll out the service - the so called "dark launch":

The secret for going from zero to seventy million users overnight is to avoid doing it all in one fell swoop. We chose to simulate the impact of many real users hitting many machines by means of a "dark launch" period in which Facebook pages would make connections to the chat servers, query for presence information and simulate message sends without a single UI element drawn on the page.

The choice of Erlang by the Facebook engineers is a significant endorsement for the language. Yariv Sadan, long time Erlang evangelist, notes:

This announcement should remove any doubts that Erlang is *the* platform for building scalable realtime (aka Comet) applications.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.