InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Scaling Out the Most Popular Social Game, FarmVille

Posted by Abel Avram on Mar 20, 2010

Sections
Architecture & Design
Topics
Architecture ,
Performance & Scalability
Tags
Social Networking ,
Caching ,
Facebook

image

With 83.75 million monthly active users, FarmVille is the most popular game on Facebook and one of the most popular web-based games on the Internet. To scale out, the application is deployed inside the cloud, uses cache extensively, has the ability to turn off some of the functionality during peak times and makes use of performance monitoring and managing.

Launched in June 2009, FarmVille had its first million users after 4 days, and 10 M after 60 days, according to Luke Rajlich, a developer working for Zynga, the game’s creator. With over 80 M monthly active players, FarmVille manages to engage over 20% of all Facebook users and over 1% of the world’s population. Scaling out at these proportions and in such a short time requires certain hardware and software solutions.

InfoQ interviewed Luke Rajlich to find out some architectural details. First of all, the application runs in the cloud on virtualized Linux servers, so it can request and receive additional computing power pretty easily. The application runs on a basic LAMP stack, where P stands for PHP. The application uses caching extensively:

We are basically an Object Oriented, MVC application with a custom written DB/Cache interface. We heavily rely on caching, specifically memcache, to support our workload. As well, we have a horizontally sharded database.

To handle spike traffic, the application relies on adding extra capacity in short time:

Architecturally, we are able to add capacity quickly since the application workload can be partitioned at any layer (load balancer, web server, memcache, database). In addition, we have very specific and formulaic procedure for adding capacity at any given layer. Thus, the execution of adding capacity is easily managed and can be executed quickly. We additionally run on a virtualized environment, thus we can add capacity without directly provisioning additional hardware, which significantly cuts down the time from when we make the decision to add capacity to when we actually have the necessary hardware available. We additionally have adopted configuration tools, such as puppet, that reduce the overhead required to add additional hardware. The difficult part that remains is knowing and finding which part of the application breaks in terms of performance first. In order to accommodate that concern, we have invested time into the aforementioned service degradation as well we spend a considerable amount of time working on application performance monitoring.

The game has a number of components, and when there is performance bottleneck “we can effectively turn off the less important functionality we use on the platform” in order to alleviate the demands on the application:

There are a number of other components [beside the game itself], such as friend ladders, gift requests, etc. We can strip those elements away from the game so that the basic parts of the game aren't as impacted by the performance of those components. This is crucially important as our game is primarily a timing based game where users come back to the game at a certain time to perform certain actions. Those specific actions have a big user experience impact when we have downtime, thus we want to avoid that happening for users.

The application has a high 3:1 read to write ratio, and they deal with it extensively using caching, Rajlich disclosed in an interview with Todd Hoff:

A user's state contains a large amount of data which has subtle and complex relationships. For example, in a farm, objects cannot collide with each other, so if a user places a house on their Farm, the backend needs to check that no other object in that user's farm occupies an overlapping space. Unlike most major site like Google or Facebook, which are read heavy, FarmVille has an extremely heavy write workload. The ratio of data reads to writes 3:1, which is an incredibly high write rate. A majority of the requests hitting the backend for FarmVille in some way modifies the state of the user playing the game. To make this scalable, we have worked to make our application interact primarily with cache components.

The traffic between FarmVille and the Facebook platform peaks at about 3GB/s so the client application needs to turn off some calls to the platform to avoid blocking the communication links:

The amount of traffic between FarmVille and the Facebook platform is enormous: at peak, roughly 3 Gigabits/sec of traffic go between FarmVille and Facebook while our caching cluster serves another 1.5 Gigabits/sec to the application. Additionally, since performance can be variable, the application has the ability to dynamically turn off any calls back to the platform. We have a dial that we can tweak that turns off incrementally more calls back to the platform. We have additionally worked to make all calls back to the platform avoid blocking the loading of the application itself. The idea here is that, if all else fails, players can continue to at least play the game.

For performance monitoring and management “we use nagios for alerting, munin for monitoring, and puppet for configuration. We heavily utilize internal stats systems to track performance of the services the application uses, such as Facebook, DB, and Memcache. Additionally, when we see performance degradation, we profile a request's IO events on a sampled basis.”

As a side note, according to Inside Social Games analyst Justin Smith, Zynga, the company behind FarmVille, made $490 M in revenue last year and expects to make $835 M this year.

What is the technology stack by Vikas Hazrati Posted
Re: What is the technology stack by Abel Avram Posted
Amazing by Ed Pichler Posted
  1. Back to top

    What is the technology stack

    by Vikas Hazrati

    Cool that sounds very encouraging.

    What is the tech stack? What kind of servers host the app? Which cloud vendor - public or private?

  2. Back to top

    Re: What is the technology stack

    by Abel Avram

    Hi Vikas,
    the article specifies the stack: " The application runs on a basic LAMP stack, where P stands for PHP." M=MySQL.
    They do run in a cloud, I suppose a public cloud. They have been very laconic at giving details about their relationship with Facebook or their cloud provider.

  3. Back to top

    Amazing

    by Ed Pichler

    One percent of the world play it! Wow!

    Very good article.

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.