Marco Bonzanini discusses the process of building data pipelines and all the steps necessary to prepare data, focusing on data plumbing and going from prototype to production.
Debraj GuhaThakurta discusses ML and data analysis processes in Spark using examples written in Python and R.
Jeroen Janssens discusses several tricks for polyglot programmers helping to mix and match different languages and tools in a project.
Tracy Miranda demonstrates Python with the Eclipse Advanced Scripting Environment (EASE) for collaboration, reproducible research, and exploratory computation and data analysis.
David Talby demos using Python libraries to build a ML model for fraud detection, scaling it up to billions of events using Spark, and what it took to make the system perform and ready for production.
Roy Rapoport discusses the power of alignment (or lack thereof) using real-world examples, his experience introducing Python in production, and the organizational structures and culture within Netflix
Mark Pollack discusses Spring XD and its integration driven by the Big Data ecosystem at large such as Kafka, Spark, functional programming, integration with Python, and designer/monitoring UIs.
The authors demonstrate the design and use of an environment for quantitative researchers building a market risk simulation first as a basic system and then adding a hypothetical systemic shock.
Uri Laserson reviews the different available Python frameworks for Hadoop, including a comparison of performance, ease of use/installation, differences in implementation, and other features.
Jessica McKellar introduces Twisted, a Python event-driven networking engine, and explaining several design concepts used: deferred API, transport/protocol separation, and plug-in infrastructure.
Mike Solomon shares some of the experiences and lessons learned scaling YouTube over the years.
Dustin Getz shows writing monads code explaining how they work and why they are useful.