BT
Older rss
41:46
AI, ML & Data Engineering Follow 995 Followers

Streaming for Personalization Datasets at Netflix

Posted by Shriya Arora  on  Jul 26, 2017 Posted by Shriya Arora Follow 0 Followers  on  Jul 26, 2017

Shriya Arora discusses challenges faced with stream processing unbounded datasets, comparing microbatch with event-based approaches using Spark and Flink.

54:53
AI, ML & Data Engineering Follow 995 Followers

When Streams Fail: Kafka Off the Shore

Posted by Anton Gorshkov  on  Jul 18, 2017 1 Posted by Anton Gorshkov Follow 0 Followers  on  Jul 18, 2017 1

Anton Gorshkov discusses how to evaluate and architect a resilient streaming platform, focusing on Kafka and Spark streaming and sharing his experience of using them to process financial transactions.

45:00
AI, ML & Data Engineering Follow 995 Followers

Data Preparation for Data Science: A Field Guide

Posted by Casey Stella  on  Apr 23, 2017 Posted by Casey Stella Follow 0 Followers  on  Apr 23, 2017

Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.

47:03
AI, ML & Data Engineering Follow 995 Followers

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow Follow 0 Followers  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

40:48
AI, ML & Data Engineering Follow 995 Followers

Data Science in the Cloud @StitchFix

Posted by Stefan Krawczyk  on  Feb 17, 2017 Posted by Stefan Krawczyk Follow 0 Followers  on  Feb 17, 2017

Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.

32:49
AI, ML & Data Engineering Follow 995 Followers

Machine Learning and End-to-End Data Analysis Processes in Spark Using Python and R

Posted by Debraj GuhaThakurta  on  Feb 05, 2017 Posted by Debraj GuhaThakurta Follow 0 Followers  on  Feb 05, 2017

Debraj GuhaThakurta discusses ML and data analysis processes in Spark using examples written in Python and R.

34:13
AI, ML & Data Engineering Follow 995 Followers

MLeap: Release Spark ML Models

Posted by Hollin Wilkins  on  Dec 04, 2016 Posted by Hollin Wilkins Follow 0 Followers  on  Dec 04, 2016

Hollin Wilkins discusses the reasons behind MLeap, outes the programming time saved by using it, shows benchmarks of several online models, and provides a demo and examples of using it in practice.

41:39
AI, ML & Data Engineering Follow 995 Followers

Hydrator: Open Source, Code-Free Data Pipelines

Posted by Jonathan Gray  on  Oct 23, 2016 Posted by Jonathan Gray Follow 0 Followers  on  Oct 23, 2016

Jonathan Gray introduces Hydrator, an open source framework and user interface for creating data lakes for building and managing data pipelines on Spark, MapReduce, Spark Streaming and Tigon.

59:07
AI, ML & Data Engineering Follow 995 Followers

Exploring Wikipedia with Apache Spark: A Live Coding Demo

Posted by Sameer Farooqui  on  Aug 23, 2016 Posted by Sameer Farooqui Follow 0 Followers  on  Aug 23, 2016

Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.

50:44
AI, ML & Data Engineering Follow 995 Followers

Ingest & Stream Processing - What Will You Choose?

Posted by Ted Malaska  on  Aug 14, 2016 1 Posted by Ted Malaska Follow 0 Followers , Pat Patterson Follow 0 Followers  on  Aug 14, 2016 1

Pat Patterson and Ted Malaska talk about current and emerging data processing technologies, and the various ways of achieving "at least once" and "exactly once" timely data processing.

30:44
AI, ML & Data Engineering Follow 995 Followers

Monitoring and Troubleshooting Real-Time Data Pipelines

Posted by Premal Shah  on  Jul 20, 2016 Posted by Premal Shah Follow 0 Followers , Alan Ngai Follow 0 Followers  on  Jul 20, 2016

Alan Ngai and Premal Shah discuss best practices on monitoring ​distributed real-time data processing frameworks and how DevOps can gain control and visibility over these data pipelines.

41:26
AI, ML & Data Engineering Follow 995 Followers

Hunting Criminals with Hybrid Analytics

Posted by David Talby  on  May 10, 2016 Posted by David Talby Follow 0 Followers  on  May 10, 2016

David Talby demos using Python libraries to build a ML model for fraud detection, scaling it up to billions of events using Spark, and what it took to make the system perform and ready for production.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT