Mike Nolet shares lessons learned scaling AppNexus and architectural details of their system processing 30TB/day: Hadoop, load balancer-free DNS architecture built in GSLB and Keepalived, and real-time data streaming built in C.
Michael Hausenblas introduces Apache Drill, a distributed system for interactive analysis of large-scale datasets, including its architecture and typical use cases.
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
Rebecca Parsons reviews some of the changes in how data is used and analyzed, including new technology approaches, looking at how data is used to track election violence, movement of people after a natural disaster, and attempts to predict famine and other humanitarian crises before they happen.
Karim Chine introduces Elastic-R, demonstrating some of its applications in bioinformatics and finance.
Vaclav Petricek digs some of the romantic interactions nuggets hidden in eHarmony's large collection of human relationships.
Michael Kopp explains how to run performance code at scale with Hadoop and how to analyze and optimize Hadoop jobs.
Nathan Marz shares lessons learned building Storm, an open-source, distributed, real-time computation system.
Eli Collins overviews how to build new applications with Hadoop and how to integrate Hadoop with existing applications, providing an update on the state of Hadoop ecosystem, frameworks and APIs.
Dean Wampler supports using Functional Programming and its core operations to process large amounts of data, explaining why Java’s dominance in Hadoop is harming Big Data’s progress.
Francine Bennett keynotes on using big data in the cloud.
Claudia Perlich keynotes on M6D’s approach to Big Data, using data granularity to build predictive models used for user targeting, bid optimization and fraud detection.