InfoQ Homepage Hadoop Content on InfoQ

Presentations

RSS Feed

Newer Older

DevOps

Docker Data Science Pipeline

Lennard Cornelis explains why they chose OpenShift and Docker to connect to the Hadoop environment, also how to set up a Docker container running a data science model using Hive, Python, and Spark.

Lennard Cornelis
on May 18, 2019

Icon

32:40
Architecture & Design

Scaling Marketplaces at Thumbtack

Nate Kupp shares some of Thumbtack’s key learnings on their journey to scale and their future with fully-managed systems.

Nate Kupp
on Mar 14, 2018

Icon

39:59
AI, ML & Data Engineering

Best Trade-off Point Algorithm for Efficient Resource Provisioning in Hadoop

Peter Nghiem presents the Best Trade-off Point method and algorithm with mathematical formulas for obtaining the exact optimal number of task resources for any workload running on Hadoop.

Peter Nghiem
on Oct 28, 2017

Icon

43:21
AI, ML & Data Engineering

Data Science in the Cloud @StitchFix

Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.

Stefan Krawczyk
on Feb 17, 2017

Icon

40:48
AI, ML & Data Engineering

Streaming Live Data and the Hadoop Ecosystem

Oleg Zhurakousky discusses the Hadoop ecosystem – Hadoop, HDFS, Yarn-, and how projects such as Hive, Atlas, NiFi interact and integrate to support the variety of data used for analytics.

Oleg Zhurakousky
on Jan 29, 2017

Icon

33:53
AI, ML & Data Engineering

Achieving Mega-Scale Business Intelligence through Speed of Thought Analytics on Hadoop

Ian Fyfe discusses the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop.

Ian Fyfe
on Oct 26, 2016

Icon

30:29
AI, ML & Data Engineering

The Mechanics of Testing Large Data Pipelines

Mathieu Bastian explores the mechanics of unit, integration, data and performance testing for large, complex data workflows, along with the tools for Hadoop, Pig and Spark.

Mathieu Bastian
on Apr 24, 2016

Icon

36:19
AI, ML & Data Engineering

Hadoop Workflows and Distributed YARN Apps using Spring Technologies

The authors discuss how Spring for Apache Hadoop can make developing workflows with Map Reduce, Spark, Hive and Pig jobs easier, and using Spring Cloud to build distributed apps for YARN.

Thomas Risberg Janne Valkealahti
on Feb 21, 2016

Icon

01:26:57
AI, ML & Data Engineering

Federated Queries with HAWQ - SQL on Hadoop and Beyond

Christian Tzolov shows different integration approaches between HAWQ and GemFire, showing using Spring XD to ingest GemFire data into HDFS and using Spring Boot to implement a RESTful proxy for HAWQ.

Christian Tzolov
on Feb 04, 2016

Icon

01:11:54
How 30 Years of Ticket Transaction Data Helps you Discover New Shows!

Vaclav Petricek discusses how to train models, architect and build a scalable system powered by Storm, Hadoop, Spark, Spring Boot and Vowpal Wabbit that meets SLAs measured in tens of milliseconds.

Vaclav Petricek
on Aug 19, 2015

Icon

48:46
Lightning Fast Cluster Computing with Spark and Cassandra

Piotr Kołaczkowski discusses how they integrated Spark with Cassandra, how it was done, how it works in practice and why it is better than using a Hadoop intermediate layer.

Piotr Kołaczkowski
on Jun 17, 2015

Icon

49:53
AI, ML & Data Engineering

Better Together - Using Spark and Redshift to Combine Your Data with Public Datasets

Eugene Mandel discusses challenges of conforming data sources and compares processing stacks: Hadoop+Redshift vs Spark, showing how the technology drives the way the problem is modeled.

Eugene Mandel
on Mar 12, 2015

Icon

35:16