Insight into the Phases of Scaling
Christopher Smith shared insight into approaching and solving the problems of scaling web applications in his presentation "The Five Stages of Scale" at Scale11x last month. In Christopher's presentation he made a case for approaching scaling in a stages with well defined components that are either added or optimized to improve the overall scale of a web application. He took the audience through an entertaining and informative journey from load balancing through optimized usage of the UDP protocol.
The most important basic scaling architecture is to have the ability to add web application servers behind a load balancer. A load balancer allows for linear scaling of a web application by partitioning requests and sessions across application servers. This technique amounts to adding application servers to increase scale linearly, however it just delays the inevitable C10K problem because it does not increase the responsiveness of individual requests.
Christopher spoke about how caching systems placed in front of web applications can provide for scaling by handling read operations. Multiple caching systems can be used in combination to maximize scale. Memcache servers and the like can store data in memory for quick retrieval by application servers. A reverse proxy can be placed in front of the load balancer and serve cached resources. Finally a content delivery network (or CDN) can be used to put cached resources closer to end users. Caching however has its limitations in the writing of data.
An optimized persistence framework will take your ability to scale writes to a new phase in scale. According to Christopher, succeeding in this phase in combination with the before mentioned will be sufficient for most people. Choosing the proper SQL database or NoSQL databases to match the application data structures will significantly improve scale. The ability to do concurrent read/writes will increase throughput and responsiveness of write operations. Finally if you can "Cheat on ACID (Particularly C & D)" you can get more writes done faster.
The underpinnings of these scaling techniques is the minimization of the latency of data reads/writes by web applications. Christopher shared the latency times for different operations on computers:
- L1 cache reference - 0.5 ns
- Branch mispredict - 5 ns
- L2 cache reference - 7 ns
- Mutex lock/unlock - 25 ns
- Main memory reference - 100 ns
- Compress 1K bytes with Zippy - 3,000 ns
- Send 1K bytes over 1 Gbps network - 10,000 ns (or 0.01 ms)
- Read 4K randomly from SSD* - 150,000 ns (or 0.15 ms)
- Read 1 MB sequentially from memory - 250,000 ns (or 0.25 ms)
- Round trip within same datacenter - 500,000 ns (or 0.5 ms)
- Read 1 MB sequentially from SSD* - 1,000,000 ns (or 1 ms)
- Disk seek - 10,000,000 ns (or 10 ms)
- Passing code instead of data using commodity servers: Map/Reduce (Hadoop), DHT, (Cassandra, HBase, Riak)
- Routing data through data partitions: ESP/CEP, Eigen, Storm, Esper, StreamBase, 0mq, etc.
- Using UDP instead of TCP
The most advanced techniques are in use by companies that manage hyperscale web applications, for example Facebook uses UDP to perform hundreds of thousands of requests/second against Memcached.