Hairong Kuang explains how Facebook uses HDFS to store and analyze over 100PB of user log data.
Dmitriy Ryaboy shares some of the lessons learned scaling Twitter’s analytics infrastructure: Data loves a schema, Make data sources discoverable, and Make costs visible.
Dhruba Borthakur discusses the different types of data used by Facebook and how they are stored, including graph data, semi-OLTP data, immutable data for pictures, and Hadoop/Hive for analytics.
Serkan Piantino discusses news feeds at Facebook: the basics, infrastructure used, how feed data is stored, and Centrifuge – a storage solution.
Michael Stonebraker compares how RDBMS, NoSQL and NewSQL support today’s big data transaction processing needs. He also introduces VoltDB, an in-memory NewSQL database.
Raffi Krikorian details Twitter’s timeline architecture, its “write path” and “read path”, making it possible to deliver 300k tweets/sec.
Phil Calçado presents SoundCloud’s approach to dealing with scalability issues when their user number grew beyond what they initially could support by creating services in various languages.
Ross Lawley introduces MongoDB, explaining why it is a good solution for cloud deployment.
Uri Cohen discusses several types of queues with their pros and cons used in financial and trading industries for highly parallelized data processing.
Jeremy Edberg discusses running Netflix services on AWS: storage, streaming and scaling solutions, multi-region deployments, why cloud over private data center, and architectural snapshots.
David Dawson and Marcus Kern share lessons learned creating a high-performance mass audience participation system using NoSQL.