BT
rss
51:39
AI, ML & Data Engineering Follow 751 Followers

Streaming SQL Foundations: Why I ❤Streams+Tables

Posted by Tyler Akidau  on  Feb 17, 2018 Posted by Tyler Akidau Follow 1 Followers  on  Feb 17, 2018

Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.

46:58
AI, ML & Data Engineering Follow 751 Followers

Scaling with Apache Spark

Posted by Holden Karau  on  Aug 05, 2017 Posted by Holden Karau Follow 3 Followers  on  Aug 05, 2017

Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.

47:03
AI, ML & Data Engineering Follow 751 Followers

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow Follow 0 Followers  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

59:07
AI, ML & Data Engineering Follow 751 Followers

Exploring Wikipedia with Apache Spark: A Live Coding Demo

Posted by Sameer Farooqui  on  Aug 23, 2016 Posted by Sameer Farooqui Follow 0 Followers  on  Aug 23, 2016

Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.

33:35
AI, ML & Data Engineering Follow 751 Followers

Apache Beam: The Case for Unifying Streaming APIs

Posted by Andrew Psaltis  on  Jul 30, 2016 Posted by Andrew Psaltis Follow 0 Followers  on  Jul 30, 2016

Andrew Psaltis talks about Apache Beam, which aims to provide a unified stream processing model for defining and executing complex data processing, data ingestion and integration workflows.

36:19
AI, ML & Data Engineering Follow 751 Followers

The Mechanics of Testing Large Data Pipelines

Posted by Mathieu Bastian  on  Apr 24, 2016 1 Posted by Mathieu Bastian Follow 0 Followers  on  Apr 24, 2016 1

Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.

43:44
AI, ML & Data Engineering Follow 751 Followers

Rethinking Streaming Analytics for Scale

Posted by Helena Edelson  on  Apr 03, 2016 Posted by Helena Edelson Follow 1 Followers  on  Apr 03, 2016

Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.

49:07
AI, ML & Data Engineering Follow 751 Followers

The Lego Model for Machine Learning Pipelines

Posted by Leah McGuire  on  Jan 16, 2016 Posted by Leah McGuire Follow 0 Followers  on  Jan 16, 2016

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

49:53
Followers

Lightning Fast Cluster Computing with Spark and Cassandra

Posted by Piotr Kołaczkowski  on  Jun 17, 2015 Posted by Piotr Kołaczkowski Follow 0 Followers  on  Jun 17, 2015

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

19:02
Followers

Translating Imperative Code to MapReduce

Posted by Cosmin Radoi  on  Jun 10, 2015 Posted by Cosmin Radoi Follow 0 Followers , Rodric Rabbah Follow 0 Followers , Stephen J Fink Follow 0 Followers , Manu Sridharan Follow 0 Followers  on  Jun 10, 2015

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

48:14
Followers

A Taste of Random Decision Forests on Apache Spark

Posted by Sean Owen  on  Apr 28, 2015 Posted by Sean Owen Follow 0 Followers  on  Apr 28, 2015

Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT