Sharad Murthy & Tony Ng present Pulsar, a real-time streaming system which can scale to millions of events per second with high availability and 4GL language support.
Matthew Renze introduces the R programming language and demonstrates how R can be used for exploratory data analysis.
Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.
Neha Narkhede discusses how companies are using Apache Kafka and where it fits in the Big Data ecosystem.
Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.
Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.
The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.
Colin Mower discusses the challenges met using together Cloud, Big Data, Mobile and Security and how these can work together to achieve business value.
Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.
John Davies shows a Spring work-flow consuming 7.4kB XML messages, binding them to 25kB Java but storing them in just 450 bytes each, 10 million derivative contracts in-memory on a laptop.
Lin Qiao discusses the architecture of Gobblin, LinkedIn’s framework for addressing the need of high quality and high velocity data ingestion.
Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.