Jonathan Gray introduces Hydrator, an open source framework and user interface for creating data lakes for building and managing data pipelines on Spark, MapReduce, Spark Streaming and Tigon.
The authors present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework using Mold, translating Java code to run on Apache Spark.
Roman Shaposhnik discusses more advanced features of HDFS, in addition to how YARN has enabled businesses to massively scale their systems beyond what was previously possible.
Dean Wampler argues that Spark/Scala is a better data processing engine than MapReduce/Java because tools inspired by mathematics, such as FP, are ideal tools for working with data.
The authors explain how the Pivotal team leveraged familiar SQL-based queries to analyze fine-grained cluster utilization using Spring XD.
Jim Scott keynotes on the history of Hadoop, the difficulties that this technology has gone through, exploring the reasons why enterprises need to evaluate their targets and prepare for the future.
Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc.
Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce.
Crista Lopes writes a program in multiple styles -monolithic/OOP/continuations/relational/Pub-Sub/Monads/AOP/Map-reduce- showing the value of using more than a style in large scale systems.
Dean Wampler discusses the strengths and weaknesses of MapReduce, and the newer variants for big data processing: Pregel and Storm.
Dierk König introduces GPars, Groovy’s library for concurrent programming, explaining a simpler and less error-prone way to use fork/join, map/reduce, actors, and dataflow in Java and Groovy.