InfoQ Homepage Data Content on InfoQ
-
Data-Driven Coaching - Safely Turning Team Data into Coaching Insights
Troy Magennis shows how to expose data to teams in order for them to retrospect productively, determine if a process experiment is panning out as expected, and to explore process change opportunities.
-
Machine Learning in Academia and Industry
Deborah Hanus discusses some of the challenges that can arise when working with data.
-
AI-Based Data Extraction
George Roth presents the challenges of data extraction from unstructured content in the context of preparing the data for Data Analytics.
-
Data Preparation for Data Science: A Field Guide
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
-
Straggler Free Data Processing in Cloud Dataflow
Eugene Kirpichov describes the theory and practice behind Cloud Dataflow's approach to straggler elimination, and the associated non-obvious challenges, benefits, and implications of the technique.
-
Scaling up Near Real-Time Analytics @Uber &LinkedIn
Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.
-
Effective Data Pipelines: Data Mngmt from Chaos
Katharine Jarmul discusses implementation decisions for those looking for a practical recommendation on the "what" and "how" of data automation workflows.
-
Building Data Pipelines in Python
Marco Bonzanini discusses the process of building data pipelines and all the steps necessary to prepare data, focusing on data plumbing and going from prototype to production.
-
Data Science in the Cloud @StitchFix
Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.
-
Scaling the Data Infrastructure @Spotify
Mārtiņš Kalvāns and Matti Pehrs overview the Data Infrastructure at Spotify, diving into some of the data infrastructure components, such us Event Delivery, Datamon and Styx.
-
Data Microservices in the Cloud
Mark Pollack introduces Spring Cloud Data Flow enabling one to create pipelines for data ingestion, real-time analytics and data import/export, demoing apps that are deployed onto multiple runtimes.
-
Targeting Your Audience: Data Visualization to Communicate Data Insights
Randy Krum explains how to use the power of data visualization to convey actionable insights to an audience, making data clear and memorable by showing the audience what the data means.