BT
Older Newer rss
01:30:26

Developing Real-time Data Pipelines with Apache Kafka

Posted by Joe Stein  on  Mar 04, 2016 Posted by Joe Stein  on  Mar 04, 2016

Joe Stein makes an introduction for developers about why and how to use Apache Kafka. Apache Kafka is a publish-subscribe messaging system rethought of as a distributed commit log.

01:24:27

Apache Spark for Big Data Processing

Posted by Ilayaperumal Gopinathan  on  Feb 14, 2016 Posted by Ilayaperumal Gopinathan Ludwine Probst  on  Feb 14, 2016

Ilayaperumal Gopinathan and Ludwine Probst discuss Spark and its ecosystem, in particular Spark Streaming and MLlib, providing a concrete example, and showing how to use Spark with Spring XD.

49:07

The Lego Model for Machine Learning Pipelines

Posted by Leah McGuire  on  Jan 16, 2016 Posted by Leah McGuire  on  Jan 16, 2016

Leah McGuire describes the machine learning platform Salesforce wrote on top of Spark to modularize data cleaning and feature engineering.

54:52

Tuning Java for Big Data

Posted by Scott Seighman  on  Oct 28, 2015 Posted by Scott Seighman  on  Oct 28, 2015

Scott Seighman discusses causes of common performance issues in Big Data environments, heap size, garbage collection, JVM reuse tuning guidelines and Big Data performance analysis tools.

44:53

Ground-up Introduction to In-memory Data

Posted by Viktor Gamov  on  Oct 10, 2015 Posted by Viktor Gamov  on  Oct 10, 2015

Viktor Gamov covers In-Memory technology, distributed data topologies, making in-memory reliable, scalable and durable, when to use NoSQL, and techniques for Big In-Memory Data.

44:41

Pulsar: Real-time Analytics at Scale

Posted by Sharad Murthy  on  Sep 13, 2015 Posted by Sharad Murthy Tony Ng  on  Sep 13, 2015

Sharad Murthy & Tony Ng present Pulsar, a real-time streaming system which can scale to millions of events per second with high availability and 4GL language support.

48:35

Exploratory Data Analysis with R

Posted by Matthew Renze  on  Sep 13, 2015 3 Posted by Matthew Renze  on  Sep 13, 2015 3

Matthew Renze introduces the R programming language and demonstrates how R can be used for exploratory data analysis.

01:21:47

Spreadsheets for Developers

Posted by Felienne Hermans  on  Sep 11, 2015 Posted by Felienne Hermans  on  Sep 11, 2015

Felienne Hermans presents various algorithms that outlining the power of Excel, showing that spreadsheets are fit for TDD and rapid prototyping.

42:09

The Many Faces of Apache Kafka: How is Kafka Used in Practice

Posted by Neha Narkhede  on  Aug 27, 2015 1 Posted by Neha Narkhede  on  Aug 27, 2015 1

Neha Narkhede discusses how companies are using Apache Kafka and where it fits in the Big Data ecosystem.

42:33

Financial Modeling with Apache Spark: Calculating Value at Risk

Posted by Sandy Ryza  on  Jul 12, 2015 Posted by Sandy Ryza  on  Jul 12, 2015

Sandy Ryza aims to give a feel for what it is like to approach financial modeling with modern big data tools, using the Monte Carlo method for a a basic VaR calculation with Spark.

49:53

Lightning Fast Cluster Computing with Spark and Cassandra

Posted by Piotr Kołaczkowski  on  Jun 17, 2015 Posted by Piotr Kołaczkowski  on  Jun 17, 2015

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

19:02

Translating Imperative Code to MapReduce

Posted by Cosmin Radoi  on  Jun 10, 2015 Posted by Cosmin Radoi Rodric Rabbah Stephen J Fink Manu Sridharan  on  Jun 10, 2015

The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.

BT