Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
Greg Murphy describes how GameSparks has designed their platform to be tolerant of many things: unreliable and slow internet connectivity, cloud resources that can fail without warning, and more.
The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.
Jim Webber explores the new Causal clustering architecture for Neo4j, how it allows users to read writes straightforwardly, explaining why this is difficult to achieve in distributed systems.
Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.
Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.
Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.
Akshat Vig and Khawaja Shams discuss DynamoDB Streams and what it takes to build an ordered, highly available, durable, performant, and scalable replicated log stream.
Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.
Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.
Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.