InfoQ Homepage Apache Spark Content on InfoQ

Presentations

RSS Feed

Newer Older

AI, ML & Data Engineering

Productionizing H2O Models with Apache Spark

Jakub Hava demonstrates the creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.

Jakub Hava
on May 09, 2019

Icon

34:50
AI, ML & Data Engineering

Accelerated Spark on Azure: Seamless and Scalable Hardware Offloads in the Cloud

Yuval Degani shows how hardware accelerations in Azure can be utilized to speed-up Spark jobs, with the aid of RDMA (Remote Direct Memory Access) support in the VM.

Yuval Degani
on Nov 03, 2018

Icon

38:06
AI, ML & Data Engineering

Streaming SQL Foundations: Why I ❤Streams+Tables

Tyler Akidau explores the relationship between the Beam Model and stream & table theory, stream processing in SQL with Apache Beam, Calcite, Flink, Kafka KSQL and Apache Spark’s Structured streaming.

Tyler Akidau
on Feb 17, 2018

Icon

51:39
AI, ML & Data Engineering

Scaling with Apache Spark

Holden Karau looks at Apache Spark from a performance/scaling point of view and what’s needed to handle large datasets.

Holden Karau
on Aug 05, 2017

Icon

46:58
AI, ML & Data Engineering

Real-Time Recommendations Using Spark Streaming

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

Elliot Chow
on Mar 30, 2017

Icon

47:03
AI, ML & Data Engineering

Exploring Wikipedia with Apache Spark: A Live Coding Demo

Sameer Farooqui demos connecting to the live stream of Wikipedia edits, building a dashboard showing what’s happening with Wikipedia datasets and how people are using them in real time.

Sameer Farooqui
on Aug 23, 2016

Icon

59:07
AI, ML & Data Engineering

Apache Beam: The Case for Unifying Streaming APIs

Andrew Psaltis talks about Apache Beam, which aims to provide a unified stream processing model for defining and executing complex data processing, data ingestion and integration workflows.

Andrew Psaltis
on Jul 30, 2016

Icon

33:35
AI, ML & Data Engineering

The Mechanics of Testing Large Data Pipelines

Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.

Mathieu Bastian
on Apr 24, 2016

Icon

36:19
AI, ML & Data Engineering

Rethinking Streaming Analytics for Scale

Helena Edelson addresses new architectures emerging for large scale streaming analytics based on Spark, Mesos, Akka, Cassandra and Kafka (SMACK) or Apache Flink or GearPump.

Helena Edelson
on Apr 03, 2016

Icon

43:44
AI, ML & Data Engineering

The Lego Model for Machine Learning Pipelines

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

Leah McGuire
on Jan 16, 2016

Icon

49:07
Lightning Fast Cluster Computing with Spark and Cassandra

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

Piotr Kołaczkowski
on Jun 17, 2015

Icon

49:53
Translating Imperative Code to MapReduce

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

Cosmin Radoi Manu Sridharan Stephen J Fink Rodric Rabbah
on Jun 10, 2015

Icon

19:02

Newer Presentations

Older Presentations

InfoQ Software Architects' Newsletter

Presentations