BT

Ready for InfoQ 3.0? Try the new design and let us know what you think!

rss
38:06
AI, ML & Data Engineering Follow 1116 Followers

Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud

Posted by Yuval Degani  on  Nov 03, 2018 Posted by Yuval Degani Follow 0 Followers  on  Nov 03, 2018

Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.

51:39
AI, ML & Data Engineering Follow 1116 Followers

Streaming SQL Foundations: Why I ❤Streams+Tables

Posted by Tyler Akidau  on  Feb 17, 2018 Posted by Tyler Akidau Follow 2 Followers  on  Feb 17, 2018

Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.

46:58
AI, ML & Data Engineering Follow 1116 Followers

Scaling with Apache Spark

Posted by Holden Karau  on  Aug 05, 2017 Posted by Holden Karau Follow 3 Followers  on  Aug 05, 2017

Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.

47:03
AI, ML & Data Engineering Follow 1116 Followers

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow Follow 0 Followers  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

59:07
AI, ML & Data Engineering Follow 1116 Followers

Exploring Wikipedia with Apache Spark: A Live Coding Demo

Posted by Sameer Farooqui  on  Aug 23, 2016 Posted by Sameer Farooqui Follow 0 Followers  on  Aug 23, 2016

Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.

33:35
AI, ML & Data Engineering Follow 1116 Followers

Apache Beam: The Case for Unifying Streaming APIs

Posted by Andrew Psaltis  on  Jul 30, 2016 Posted by Andrew Psaltis Follow 0 Followers  on  Jul 30, 2016

Andrew Psaltis talks about Apache Beam, which aims to provide a unified stream processing model for defining and executing complex data processing, data ingestion and integration workflows.

36:19
AI, ML & Data Engineering Follow 1116 Followers

The Mechanics of Testing Large Data Pipelines

Posted by Mathieu Bastian  on  Apr 24, 2016 Posted by Mathieu Bastian Follow 0 Followers  on  Apr 24, 2016

Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.

43:44
AI, ML & Data Engineering Follow 1116 Followers

Rethinking Streaming Analytics for Scale

Posted by Helena Edelson  on  Apr 03, 2016 Posted by Helena Edelson Follow 2 Followers  on  Apr 03, 2016

Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.

49:07
AI, ML & Data Engineering Follow 1116 Followers

The Lego Model for Machine Learning Pipelines

Posted by Leah McGuire  on  Jan 16, 2016 Posted by Leah McGuire Follow 0 Followers  on  Jan 16, 2016

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

49:53
Followers

Lightning Fast Cluster Computing with Spark and Cassandra

Posted by Piotr Kołaczkowski  on  Jun 17, 2015 Posted by Piotr Kołaczkowski Follow 0 Followers  on  Jun 17, 2015

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

19:02
Followers

Translating Imperative Code to MapReduce

Posted by Cosmin Radoi  on  Jun 10, 2015 Posted by Cosmin Radoi Follow 0 Followers , Stephen J Fink Follow 0 Followers , Rodric Rabbah Follow 0 Followers , Manu Sridharan Follow 0 Followers  on  Jun 10, 2015

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

48:14
Followers

A Taste of Random Decision Forests on Apache Spark

Posted by Sean Owen  on  Apr 28, 2015 Posted by Sean Owen Follow 0 Followers  on  Apr 28, 2015

Sean Owen introduces Spark, Scala and random decision forests, and demonstrates the process of analyzing a real-world data set with them.

BT