Roman Shaposhnik discusses more advanced features of HDFS, in addition to how YARN has enabled businesses to massively scale their systems beyond what was previously possible.
Dean Wampler argues that Spark/Scala is a better data processing engine than MapReduce/Java because tools inspired by mathematics, such as FP, are ideal tools for working with data.
The authors explain how the Pivotal team leveraged familiar SQL-based queries to analyze fine-grained cluster utilization using Spring XD.
Jim Scott keynotes on the history of Hadoop, the difficulties that this technology has gone through, exploring the reasons why enterprises need to evaluate their targets and prepare for the future.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc.
Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce.
Crista Lopes writes a program in multiple styles -monolithic/OOP/continuations/relational/Pub-Sub/Monads/AOP/Map-reduce- showing the value of using more than a style in large scale systems.
Dean Wampler discusses the strengths and weaknesses of MapReduce, and the newer variants for big data processing: Pregel and Storm.
Dierk König introduces GPars, Groovy’s library for concurrent programming, explaining a simpler and less error-prone way to use fork/join, map/reduce, actors, and dataflow in Java and Groovy.
Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.