InfoQ Homepage Apache Hadoop Content on InfoQ

Presentations

RSS Feed

Newer Older

DevOps

Docker Data Science Pipeline

Lennard Cornelis explains why they chose OpenShift and Docker to connect to the Hadoop environment, also how to set up a Docker container running a data science model using Hive, Python, and Spark.

Lennard Cornelis
on May 18, 2019

Icon

32:40
AI, ML & Data Engineering

Achieving Mega-Scale Business Intelligence through Speed of Thought Analytics on Hadoop

Ian Fyfe discusses the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop.

Ian Fyfe
on Oct 26, 2016

Icon

30:29
Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare

Mohammad Quraishi presents implementing a Big Data initiative, detailing preparation, goal evaluation, convincing executives, and post implementation evaluation.

Mohammad Quraishi
on Nov 29, 2014

Icon

42:12
The Next Wave of SQL-on-Hadoop: The Hadoop Data Warehouse

Marcel Kornacker presents a case study of an EDW built on Impala running on 45 nodes, reducing processing time from hours to seconds and consolidating multiple data sets into one single view.

Marcel Kornacker
on Jul 09, 2014

Icon

45:01
Next Gen Hadoop

Akmal B. Chaudhri introduces Apache™ Hadoop® 2.0 and Yet Another Resource Negotiator (YARN).

Akmal B. Chaudhri
on Apr 22, 2014

Icon

45:22
Data & Infrastructure at Airbnb

Brenden Matthews describes the infrastructure built at Airbnb using Mesos in order to support Hadoop and Storm.

Brenden Matthews
on Dec 31, 2013

Icon

46:49
Graph Computing at Scale

Matthias Broecheler discusses graph computing, introducing the Aurelius graph cluster enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop.

Matthias Broecheler
on Dec 27, 2013

Icon

36:51
Apache Tez: Accelerating Hadoop Query Processing

Bikas Saha and Arun Murthy detail the design of Tez, highlighting some of its features and sharing some of the initial results obtained by Hive on Tez.

Arun Murthy Bikas Saha
on Dec 05, 2013

Icon

38:16
High Speed Smart Data Ingest into Hadoop

Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.

Oleg Zhurakousky
on Oct 24, 2013

Icon

53:38
A Guide to Python Frameworks for Hadoop

Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.

Uri Laserson
on Oct 03, 2013

Icon

28:12
Leveraging Your Hadoop Cluster Better - Running Performant Code at Scale

Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.

Michael Kopp
on Aug 16, 2013

Icon

35:50
Running the Largest Hadoop DFS Cluster

Hairong Kuang explains how Facebook uses HDFS to store and analyze over 100PB of user log data.

Hairong Kuang
on Mar 15, 2013

Icon

44:36

Newer Presentations

Older Presentations

InfoQ Software Architects' Newsletter

Presentations