Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.
Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.
Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack, pros and cons and the tooling built.
Oleg Zhurakousky discusses the Hadoop ecosystem – Hadoop, HDFS, Yarn-, and how projects such as Hive, Atlas, NiFi interact and integrate to support the variety of data used for analytics.
Chun-Ho Hung and Nikhil Garg discuss Quanta, Quora's counting system powering their high-volume near-real-time analytics, describing the architecture, design goals, constraints, and choices made.
Troy Magennis explains in this workshop how to capture data and use it for reliable project forecasting using a practical and simple approach to forecasting without item effort estimation.
Lawrence Chernin describes best practices and validation methods used to deal with large unstructured data, including a suite of unit tests covering the implementations of algorithmic equations.
Ali Jalali presents how to develop a machine learning predictive analytics engine for big data analytics.
Hilary Parker discusses the history of the analysis development tools, the current state of the art, and the importance for data scientists and analysts to understand programming principles.
Jim Porzak discusses creating an analyst ready data mart that is complete at different levels of abstraction and models customer decision points in order to be able to understand customers.
Graeme Seaton discusses the drivers behind Big Data initiatives and how to approach them using the vast amounts of data available.