InfoQ Homepage Apache Hadoop Content on InfoQ
-
Docker Data Science Pipeline
Lennard Cornelis explains why they chose OpenShift and Docker to connect to the Hadoop environment, also how to set up a Docker container running a data science model using Hive, Python, and Spark.
-
Achieving Mega-Scale Business Intelligence through Speed of Thought Analytics on Hadoop
Ian Fyfe discusses the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop.
-
Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare
Mohammad Quraishi presents implementing a Big Data initiative, detailing preparation, goal evaluation, convincing executives, and post implementation evaluation.
-
The Next Wave of SQL-on-Hadoop: The Hadoop Data Warehouse
Marcel Kornacker presents a case study of an EDW built on Impala running on 45 nodes, reducing processing time from hours to seconds and consolidating multiple data sets into one single view.
-
Next Gen Hadoop
Akmal B. Chaudhri introduces Apache™ Hadoop® 2.0 and Yet Another Resource Negotiator (YARN).
-
Data & Infrastructure at Airbnb
Brenden Matthews describes the infrastructure built at Airbnb using Mesos in order to support Hadoop and Storm.
-
Graph Computing at Scale
Matthias Broecheler discusses graph computing, introducing the Aurelius graph cluster enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop.
-
Apache Tez: Accelerating Hadoop Query Processing
Bikas Saha and Arun Murthy detail the design of Tez, highlighting some of its features and sharing some of the initial results obtained by Hive on Tez.
-
High Speed Smart Data Ingest into Hadoop
Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.
-
A Guide to Python Frameworks for Hadoop
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
-
Leveraging Your Hadoop Cluster Better - Running Performant Code at Scale
Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.
-
Running the Largest Hadoop DFS Cluster
Hairong Kuang explains how Facebook uses HDFS to store and analyze over 100PB of user log data.