Dmitriy Ryaboy shares some of the lessons learned scaling Twitter’s analytics infrastructure: Data loves a schema, Make data sources discoverable, and Make costs visible.
Nathan Marz introduces Twitter Storm, outlining its architecture and use cases, and takes a look at future features to be made available.
Raffi Krikorian details Twitter’s timeline architecture, its “write path” and “read path”, making it possible to deliver 300k tweets/sec.
Nathan Marz discusses Storm concepts –streams, spouts, bolts, topologies-, explaining how to use Storms’ Clojure DSL for real-time stream processing, distributed RPS and continuous computations.
Arya Asemanfar presents Twitter’s timeline architecture, the entire sequence of steps a tweet goes through until it reaches the timeline of each user following the person who tweeted.
Attila Szegedi shares lessons learned tuning the JVM at Twitter, spending most of his talk discussing memory tuning, CPU usage tuning, and lock contention tuning.
Nathan Marz explain Storm, a distributed fault-tolerant and real-time computational system currently used by Twitter to keep statistics on user clicks for every URL and domain.
Nick Kallen discusses how Twitter handles large amounts of data in real time by creating 4 data types and query patterns -tweets, timelines, social graphs, search indices-, and the DBs storing them.
Ryan King presents how Twitter uses NoSQL technologies - Gizzard, Cassandra, Hadoop, Redis - to deal with increasing data amounts forcing them to scale out beyond what the traditional SQL has to offer
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
Marius Eriksen considers that leaky abstractions lead to scalability issues, while those providing narrow access to explicit resources - map-reduce, shared-nothing web apps, big table - scale better.