BT
Older rss
45:00

Data Preparation for Data Science: A Field Guide

Posted by Casey Stella  on  Apr 23, 2017 Posted by Casey Stella  on  Apr 23, 2017

Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.

50:39

Building Reliability in an Unreliable World

Posted by Greg Murphy  on  Apr 20, 2017 Posted by Greg Murphy  on  Apr 20, 2017

Greg Murphy describes how GameSparks has designed their platform to be tolerant of many things: unreliable and slow internet connectivity, cloud resources that can fail without warning, and more.

42:48

AI from an Investment Perspective

Posted by Sanjit Dang  on  Apr 18, 2017 Posted by Sanjit Dang Kiersten Stead Yashwanth Hemaraj Pankaj Mitra Leonard Speiser Kartik Gada Doug Dooley  on  Apr 18, 2017

The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.

49:40

Causal Consistency for Large Neo4j Clusters

Posted by Jim Webber  on  Apr 07, 2017 Posted by Jim Webber  on  Apr 07, 2017

Jim Webber explores the new Causal clustering architecture for Neo4j, how it allows users to read writes straightforwardly, explaining why this is difficult to achieve in distributed systems.

50:48

Big Data Infrastructure @ LinkedIn

Posted by Shirshanka Das  on  Apr 02, 2017 Posted by Shirshanka Das  on  Apr 02, 2017

Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.

46:03

Scaling up Near Real-Time Analytics @Uber &LinkedIn

Posted by Chinmay Soman  on  Mar 30, 2017 Posted by Chinmay Soman Yi Pan  on  Mar 30, 2017

Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.

47:03

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

47:47

Stream Processing & Analytics with Flink @Uber

Posted by Danny Yuan  on  Mar 25, 2017 Posted by Danny Yuan  on  Mar 25, 2017

Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.

39:21

Demistifying DynamoDB Streams

Posted by Akshat Vig  on  Mar 25, 2017 Posted by Akshat Vig Khawaja Shams  on  Mar 25, 2017

Akshat Vig and Khawaja Shams discuss DynamoDB Streams and what it takes to build an ordered, highly available, durable, performant, and scalable replicated log stream.

49:06

Building a Data Science Capability from Scratch

Posted by Victor Hu  on  Mar 23, 2017 Posted by Victor Hu  on  Mar 23, 2017

Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.

49:31

Data Cleansing and Understanding Best Practices

Posted by Casey Stella  on  Mar 23, 2017 Posted by Casey Stella  on  Mar 23, 2017

Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.

44:45

SQL Server on Linux: Will it Perform or Not?

Posted by Slava Oks  on  Mar 22, 2017 Posted by Slava Oks  on  Mar 22, 2017

Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.

BT