Jeff Magnusson takes a deep dive into key services of Netflix’s “data platform as a service” architecture, including RESTful services that: provide comprehensive metadata management across data sources (Franklin); enable visualization and caching of results of Hadoop jobs (Sting); and visualize the execution plans produced by languages such as Pig and Hive (Lipstick).
Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.
Eli Collins overviews how to build new applications with Hadoop and how to integrate Hadoop with existing applications, providing an update on the state of Hadoop ecosystem, frameworks and APIs.
Dean Wampler supports using Functional Programming and its core operations to process large amounts of data, explaining why Java’s dominance in Hadoop is harming Big Data’s progress.
Alex Robbins introduces Cascalog, a Clojure library for writing declarative Hadoop jobs.
Hairong Kuang explains how Facebook uses HDFS to store and analyze over 100PB of user log data.
Nikita Ivanov shows adding real-time capabilities to Hadoop through a demo application streaming word counting on a 2-nodes cluster.
Kathleen Ting details 8 misconfigurations that can bring ZooKeeper down.
Costin Leau discusses Big Data, current available tools for dealing with it, and how Spring can be used to create Big Data pipelines.
Nathan Marz introduces Twitter Storm, outlining its architecture and use cases, and takes a look at future features to be made available.