Nick Kolegraff discusses common problems and architecture to support all the phases of data science and how to start a data science initiative, sharing lessons from Accenture, Best Buy, and Rackspace.
Sebastian Kanthak overviews Spanner, covering details of how Spanner relies on GPS and atomic clocks to provide two of its most innovative features: Lock-free strong (current) reads and global snapshots that are consistent with external events.
Paul King presents working with databases in Groovy, covering datasets, GMongo, Neo4J, raw JDBC, Groovy-SQL, CRUD, Hibernate, caching, Spring Data technologies, etc.
Paco Nathan reviews an example data analysis application written in Cascalog used for a recommender system based on City of Palo Alto Open Data.
Avi Bryant discusses how the laws of group theory provide a useful codification of the practical lessons of building efficient distributed and real-time aggregation systems.
Jeff Magnusson takes a deep dive into key services of Netflix’s “data platform as a service” architecture, including RESTful services that: provide comprehensive metadata management across data sources (Franklin); enable visualization and caching of results of Hadoop jobs (Sting); and visualize the execution plans produced by languages such as Pig and Hive (Lipstick).
Tamar Bercovici presents Box’s transition from a single MySQL database to a fully sharded MySQL architecture, all the while serving 2 billion queries per day.
Crista Lopes writes a program in multiple styles -monolithic/OOP/continuations/relational/Pub-Sub/Monads/AOP/Map-reduce- showing the value of using more than a style in large scale systems.
Dan Frank discusses stream data processing and introduces NSQ – Bitly’s open source queuing system – and other new technologies used for communication between streaming programs.
Jeff Scott Brown demoes creating a web application with Grails 2 using the command-line, GORM and Hibernate, GSP, and Spring Integration.
Ken Collier discusses Agile Analytics, a combination of sophisticated analytics techniques, lean learning principles, agile delivery methods, and "big data" technologies.
Oleg Zhurakousky discusses architectural tradeoffs and alternative implementations of real-time high speed data ingest into Hadoop.