Sean Owen provides examples of operational analytics projects in the field, presenting a reference architecture and algorithm design choices for a successful implementation based on his experience with customers and Oryx/Cloudera.
Josh Wills discusses using Hadoop technologies to build real-time data analysis models with a focus on strategies for data integration, large-scale machine learning, and experimentation.
Dan Frank discusses stream data processing and introduces NSQ – Bitly’s open source queuing system – and other new technologies used for communication between streaming programs.
Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.
Mike Nolet shares lessons learned scaling AppNexus and architectural details of their system processing 30TB/day: Hadoop, load balancer-free DNS architecture built in GSLB and Keepalived, and real-time data streaming built in C.
Charles Cai, Ashwani Roy discuss a robust, cost effective, hypothetical solution to address extreme challenges in financial institutions, from decision making support to pricing and risk management.
Owen Barnes introduces SocketStream, a Node.js framework for building single-page real-time web applications that access all of their data via WebSocket.
Serkan Piantino discusses news feeds at Facebook: the basics, infrastructure used, how feed data is stored, and Centrifuge – a storage solution.
Raffi Krikorian details Twitter’s timeline architecture, its “write path” and “read path”, making it possible to deliver 300k tweets/sec.
Richard Tibbetts presents a three-tier architecture for real-time data staging analysis, storing the results and delivering them to clients as a service accessible through a variety of interfaces.