BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Older rss
54:50

Scio: Moving Big Data to Google Cloud, a Spotify Story

Posted by Neville Li  on  May 26, 2017 Posted by Neville Li  on  May 26, 2017

Neville Li tells the Spotify’s story of migrating their big data infrastructure to Google Cloud, replacing Hive and Scalding with BigQuery and Scio, which helped them iterate faster.

45:00

Data Preparation for Data Science: A Field Guide

Posted by Casey Stella  on  Apr 23, 2017 Posted by Casey Stella  on  Apr 23, 2017

Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.

42:48

AI from an Investment Perspective

Posted by Sanjit Dang  on  Apr 18, 2017 Posted by Sanjit Dang Kiersten Stead Yashwanth Hemaraj Pankaj Mitra Leonard Speiser Kartik Gada Doug Dooley  on  Apr 18, 2017

The panelists discuss AI from an investment perspective, the challenges, the risks, trends, the role of Deep Learning, successful AI use cases, and more.

50:48

Big Data Infrastructure @ LinkedIn

Posted by Shirshanka Das  on  Apr 02, 2017 Posted by Shirshanka Das  on  Apr 02, 2017

Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.

47:03

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

49:06

Building a Data Science Capability from Scratch

Posted by Victor Hu  on  Mar 23, 2017 Posted by Victor Hu  on  Mar 23, 2017

Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.

40:48

Data Science in the Cloud @StitchFix

Posted by Stefan Krawczyk  on  Feb 17, 2017 Posted by Stefan Krawczyk  on  Feb 17, 2017

Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.

45:26

Petabytes Scale Analytics Infrastructure @Netflix

Posted by Tom Gianos  on  Feb 15, 2017 Posted by Tom Gianos Dan Weeks  on  Feb 15, 2017

Tom Gianos and Dan Weeks discuss Netflix' overall big data platform architecture, focusing on Storage and Orchestration, and how they use Parquet on AWS S3 as their data warehouse storage layer.

01:02:53

Big Data in the Real World: Technology and Use Cases

Posted by Mike Olson  on  Feb 09, 2017 Posted by Mike Olson  on  Feb 09, 2017

Mike Olson presents several use cases where big data is collected and analyzed to gather insights from the automotive, insurance, financial, and other sectors.

38:49

Using Bayesian Optimization to Tune Machine Learning Models

Posted by Scott Clark  on  Feb 07, 2017 Posted by Scott Clark  on  Feb 07, 2017

Scott Clark introduces Bayesian Global Optimization as an efficient way to optimize ML model parameters, explaining the underlying techniques and comparing it to other standard methods.

32:49

Machine Learning and End-to-End Data Analysis Processes in Spark Using Python and R

Posted by Debraj GuhaThakurta  on  Feb 05, 2017 Posted by Debraj GuhaThakurta  on  Feb 05, 2017

Debraj GuhaThakurta discusses ML and data analysis processes in Spark using examples written in Python and R.

50:44

Java (SE) State of the Union

Posted by Gil Tene  on  Jan 17, 2017 Posted by Gil Tene  on  Jan 17, 2017

Gil Tene presents the current state of Java SE and OpenJDK, the role of Java in the Big Data and Infrastructure components, JCP, the ecosystem, trends, etc.

BT