Comet: Sub-Second Latency with 10K+ Concurrent Users
Also known as Reverse AJAX, Comet's main goal is to allow real-time updates on the client of state changes occurring on the server by leveraging the persistent connection feature of HTTP 1.1. As described in the past on Infoq.com, along with Comet, there are other "push technologies" that try to achieve the same goals.
Greg Wilkins and his team at Webtide, company formed by the lead developers of the open-source web server Jetty, have run a number of performance tests aimed at gauging Comet's scalability and wrote about their findings. More specifically, the tests involved running the Dojo Cometd implementation of Bayeux protocol on Jetty. The server running Cometd as well as the client machines (between 1 and 3) - generating together a load of an equivalent of up to 20,000 users - were each Large Instances of the Amazon EC2 virtual servers. The test results are graphically summarized below:
Following are a few highlights from these tests:
- Sub-second latency was achievable even for 20,000 users. A tradeoff exists between latency and throughput. With 5,000 users the latency of 100ms at 2,000 messages/sec. increases to over 250ms at a throughput of 3,000 messages/sec.
- The tested application was a simple chat room with up to 200 users/room. "The load was a 50 byte payload sent in a burst to 10 randomly selected chat rooms at an interval fixed for each test. The interval was selected so that a steady state was obtained with the server CPU at approximately 10% and 50% idle."
- Greg acknowledged that "1 machine just can’t generate/handle the same load as 20K users each with their own computer and network infrastructure". To partially compensate for this limitation, a subset of the tests (see green circles above) simulated users running on 3 different machines.
- For the tests with 3 client machines the latency measurements were taken from the machine that simulated 1,000 users. Although not specifically measured, Greg mentioned that the upper limit for the latency observed for the other 2 clients, handling the rest of 20K users, would have been the latency observed while running the test with one client machine.
- A few modifications were needed to the Cometd demo bundled with Jetty 6.1.7. Some were related to alleviating the lock starvation on the thread pool on the server while others involved changes to setup steps.
As mentioned in a comment and one of Greg's prior posts, Jetty is able to asynchronously flush messages to the clients thus requiring fewer resources to service the same number of users. The thread pool code changes applied for these tests are available for download and Greg told Infoq that they will be part of the next Jetty release. He also added that Webtide is in the process of running similar tests via load balancers with more results to be made available soon.
Another interesting approach to address Comet scalability is that taken by Lightstreamer. Its implementation is based on a stand-alone server which does not rely on an underlying application or web server. Some web/application servers, extended to act like streaming engines, are based on a "one-thread-per-connection model". In comparison, Lightstreamer decouples the number of connections that the server can sustain from the number of threads that are employed, thus allowing it to scale to a very large number of clients.
In a conversation with Infoq, Alessandro Alinone - Lightstreamer's CTO, has shared that they have customers in the financial industry that achieve in production "an average of 10,000 concurrent users with an average update frequency of 3-5 updates per second per user." He added "that Lightstreamer is also employed as the core engine within TIBCO Ajax Message Service, through an OEM agreement. Therefore, interesting production scenarios are progressively arising on the TIBCO front too."
Along with the Server, Lightstreamer's back-end architecture includes:
- A Data Adapter - plugin module which interfaces Lightstreamer with the data source to be integrated. It can use any technology to integrate with the source but an asynchronous data feed (e.g. JMS, TIB/RV, MQ) will avoid a break with the asynchronous chain that goes to the client.
- A Metadata Adapter - plugin module which provides the Lightstreamer Server with the metadata of the push scenarios.
Complete Computation Conglomerate (CCC)
The current situation is that while the WWW allows a programmer to be agnostic about the location, technology and network path to an information resource, the programmer can't be agnostic about where the computations involved will be done. The programmer’s choice of technology (framework, language etc etc) carries with it the implicit choice about the locus of computation (server or client).
I would be interested in your feedback on my rough sketch of how http can be extended so that programmers can work with a unified programming model and delay decisions about where computation is done until run-time, based on issues like the client's available computing power, intellectual property and security.
Comet and static content