Jan Neumann presents how Comcast uses machine learning and big data processing to facilitate search for users, for capacity planning, and predictive caching.
Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.
Robert Metzger provides an overview of the Apache Flink internals and its streaming-first philosophy, as well as the programming APIs.
Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.
Joe Stein makes an introduction for developers about why and how to use Apache Kafka. Apache Kafka is a publish-subscribe messaging system rethought of as a distributed commit log.
Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.
Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.
Scott Seighman discusses causes of common performance issues in Big Data environments, heap size, garbage collection, JVM reuse tuning guidelines and Big Data performance analysis tools.
Viktor Gamov covers In-Memory technology, distributed data topologies, making in-memory reliable, scalable and durable, when to use NoSQL, and techniques for Big In-Memory Data.
Sharad Murthy & Tony Ng present Pulsar, a real-time streaming system which can scale to millions of events per second with high availability and 4GL language support.
Matthew Renze introduces the R programming language and demonstrates how R can be used for exploratory data analysis.
Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.