Randy Shoup describes KIXEYE's analytics infrastructure from Kafka queues through Hadoop 2 to Hive and Redshift, built for flexibility, experimentation, iteration, componentization, testability, reliability and replay-ability.
Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce.
Bikas Saha and Arun Murthy detail the design of Tez, highlighting some of its features and sharing some of the initial results obtained by Hive on Tez.
Jeff Magnusson takes a deep dive into key services of Netflix’s “data platform as a service” architecture, including RESTful services that: provide comprehensive metadata management across data sources (Franklin); enable visualization and caching of results of Hadoop jobs (Sting); and visualize the execution plans produced by languages such as Pig and Hive (Lipstick).
Dhruba Borthakur discusses the different types of data used by Facebook and how they are stored, including graph data, semi-OLTP data, immutable data for pictures, and Hadoop/Hive for analytics.
Jake Luciani introduces Brisk, a Hadoop and Hive distribution using Cassandra for core services and storage, presenting the benefits of running Hadoop in a peer-to-peer masterless architecture.