Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.
Randy Shoup tells war stories from Google and eBay focusing on how to scale code, infrastructure, performance, and operations, along with hard-won lessons learned in scaling them.
Eugene Dvorkin provides an introduction to Storm framework, explains how to build real-time applications on top of Storm with Groovy, how to process data from Twitter in real-time, etc.
Garrett Wampole describes an experimental methodology of applying Enterprise Integration Patterns to the near real-time processing of surveillance radar data, developed by MITRE.
Neha Narkhede of Kafka fame shares the experience of building LinkedIn's powerful and efficient data pipeline infrastructure around Apache Kafka and Samza to process billions of events every day.
The authors discuss patterns and technologies needed to scale large enterprise mobile systems, covering handling network connectivity, data reliability and real-time communication.
Brian Degenhardt discusses lessons that Twitter learned managing a high rate of change and complexity, and how those can be applied anywhere.
Sean Owen provides examples of operational analytics projects, presenting a reference architecture and algorithm design choices for a successful implementation based on his experience Oryx/Cloudera.
Josh Wills discusses using Hadoop technologies to build real-time data analysis models with a focus on strategies for data integration, large-scale machine learning, and experimentation.
Chris Riccomini discusses: Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.
Avi Bryant discusses how the laws of group theory provide a useful codification of the practical lessons of building efficient distributed and real-time aggregation systems.