Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.
Oleg Zhurakousky discusses the Hadoop ecosystem – Hadoop, HDFS, Yarn-, and how projects such as Hive, Atlas, NiFi interact and integrate to support the variety of data used for analytics.
Ian Fyfe discusses the different options for implementing speed-of-thought business analytics and machine learning tools directly on top of Hadoop.
Randy Shoup describes KIXEYE's analytics infrastructure from Kafka queues through Hadoop 2 to Hive and Redshift, built for flexibility, experimentation, iteration, testability, and reliability.
Rusty Sears introduces REEF along with examples of computational frameworks, including interactive sessions, iterative graph processing, bulk synchronous computations, Hive queries, and MapReduce.
Bikas Saha and Arun Murthy detail the design of Tez, highlighting some of its features and sharing some of the initial results obtained by Hive on Tez.
Jeff Magnusson details some of Netflix' key services: Franklin, Sting and Lipstick.
Dhruba Borthakur discusses the different types of data used by Facebook and how they are stored, including graph data, semi-OLTP data, immutable data for pictures, and Hadoop/Hive for analytics.
Jake Luciani introduces Brisk, a Hadoop and Hive distribution using Cassandra for core services and storage, presenting the benefits of running Hadoop in a peer-to-peer masterless architecture.