Learning to Scale Websites at Mozilla
Mozilla is scaling websites from thousands to hundreds of millions of users through simple scaling patterns they have learned internally according to Brandon Burton, web operations engineer at Mozilla. The lessons learned include caching, scaling out web servers, asynchronous jobs, and databases. Brandon presented Mozilla's insight into these topics last week at the Los Angeles DevOps meetup. In addition Brandon shared Mozilla's future technical plans for DevOps capabilities including self service deployments, platform as a service, and usage of public clouds. The following are the main points in Brandon's presentation:
Caching: Three main types of caching enable websites to operate efficiently: in memory data caches, local asset caches, and global asset caching. An in memory data cache (e.g. memcache for session state) is an optimum way to store state between HTTP requests, which in themselves are part of a stateless protocol. Next a local asset cache/proxy immediately in front of a website can store images and other static files for quick retrieval without the web servers having to process the associated requests; such tools include Stingray, Varnish, and Squid. Lastly a global asset cache (e.g. content delivery networks or CDNs) will perform similar duties as the before mentioned local asset cache, but will store the cached files even closer to the end users. Additionally these global asset caches (CDNs) dynamically choose the best internet routes based on each users location. Brandon shared that Mozilla uses both Akamai and EdgeCast as their CDNs.
Scaling out web servers: Web servers scale out by being disposable and configured through automation. Each web server stores no state locally that requires persistence for itself or other web servers across HTTP requests, essentially "sharing nothing". Web servers store their state externally to themselves through the use of technologies such as memcache, NFS, or S3. Therefore web servers can be disposed or added to a pool without affecting other web servers negatively. Automation tools (e.g. cfengine, Opscode Chef, or Puppet) provision web servers with a well known operational state that can be added to the pool of web servers ready to scale. These tools can also manage updates across all web servers.
Asynchronous jobs: Users expect quick responses during their usage of websites. Asynchronously processing user requests reduces contention for server computing resources, providing individual requests more responsiveness for the end user. In some cases users request such complicated tasks that there is no other reasonable way to processes them, except asynchronously, in order to maintain a similitude of performance. Tools such as Celery and RabbitMQ working in combination facilitate asynchronous processing of tasks at Mozilla according to Brandon.
Databases: Mozilla uses multi-master MySQL to provide high availability and with that MySQL slaves are used to provide read only access to data. Additionally the slaves are behind a load balancer so that processing of read requests are distributed to prevent overloading individual servers. Fusion-IO and Kingston SSD storage adds to the performance of the databases as well. Brandon also made sure to speak about the importance of "Awesome DBAs" at Mozilla, they maintain the before mentioned database systems.
Brandon shared the future plans to improve DevOps at Mozilla. They will be building out self service deployments through Jenkins. Additionally they are building a platform as a service for their internal development teams based on ActiveState's Stackato technology. Mozilla is also working toward scaling into the AWS public cloud.
Brandon will be giving a more technical deep dive into this content at the Scale11x conference, this February in Los Angeles. He also helps run hangops, a weekly Google hangout where DevOps topics are discussed for an hour. Topics include culture, remote working, and operations tools.
Ronny Kohavi Dec 12, 2013