How Twitter Answers Handles Five Billion Sessions a Day

| by Sergio De Simone Follow 14 Followers on Mar 09, 2015. Estimated reading time: 1 minute |

Twitter's Answers is an analytics service for mobile apps that has come to see five billion sessions per day. Ed Solovey, software engineer at Twitter, has described how their system works to provide "reliable, real-time, and actionable" data based on hundreds of millions of mobile devices sending millions of events every second.

As Solovey explains, Answers main duties are the following:

  • receiving events;
  • archiving them;
  • performing offline and real-time computations;
  • merging the results of those computations into coherent information.

The service in charge of receiving events from the devices is written in Go and uses Amazon Elastic Load Balancer. It enqueues every payload into a durable Kafka queue. Given the sheer number of events to store, Kafka is only used as a temporary cache containing a few hours worth of data, while Storm is used to transfer the data to Amazon S3.

Once the data is in S3, Amazon Elastic MapReduce is used to process it in batch. The result is stored in a Cassandra cluster to be available for querying through an API.

This is only half of the story, though. Answers indeed also processes data in real-time. At this aim, the content of the Kafka store is piped into Storm and processed through probabilistic algorithms like Bloom Filters and HyperLogLog to provide timely results at the cost of a "negligible loss of accuracy." The results are stored again to Cassandra.

At the end of the process, Cassandra contains batch computation results as well as real-time results. The query API is in charge to combine those two streams to provide a coherent view based on the query parameters. Results from the batch computation are preferred when available, since they are more precise, otherwise real-time results are provided.

The architecture described here is also effective when it comes to failure handling, says Solovey, thanks to the use of durable queues to connect Answers components. This will ensure that any outage at one component will not affect the others. Furtermore, it also allows for recovery and guarantees no data loss if the system goes back to normal operation within a given timeframe.

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you