InfoQ Homepage Data Content on InfoQ
-
Multi-Region Data Streaming with Redpanda
Michał Maślanka introduces the design of Redpanda’s Multi-Region feature, and describes how they leveraged Raft’s properties, a constraint solver, automatic data balancing, and tiered storage.
-
Graph Learning at the Scale of Modern Data Warehouses
Subramanya Dulloor outlines an approach to addressing the challenges of warehouses and shows how to build an efficient and scalable end-to-end system for graph learning in data warehouses.
-
How Netflix Ensures Highly-Reliable Online Stateful Systems
Joseph Lynch discusses the architecture of Netflix's stateful caches and databases, including how they capacity plan, bulkhead, and deploy software to their global, full-active, data topology.
-
Ephemeral Execution is the Future of Computing, but What about the Data?
Jerop Kipruto and Christie Warwick use Tekton to explore challenges of data gravity in ephemeral execution, discussing clean container injection mechanisms and a secure server interface.
-
Improve Feature Freshness in Large Scale ML Data Processing
Zhongliang Liang covers the impact of feature freshness on model performance, discussing various strategies and techniques that can be used to improve feature freshness.
-
The Rise of the Serverless Data Architectures
Gwen Shapira explores the implications of serverless workloads on the design of data stores, and the evolution of data architectures toward more flexible scalability.
-
Building High-Fidelity Data Streams
Sid Anand discusses how they built a lossless streaming data system that guarantees sub-second (p95) event delivery at scale with better than three nines availability.
-
What is Derived Data? (and Do You Already Have Any?)
Felix GV explains what derived data is, and dives into four major use cases which fit in the derived data bucket, including: graphs, search, OLAP and ML feature storage.
-
Real-Time Machine Learning: Architecture and Challenges
Chip Huyen discusses the value of fresh data as well as different types of architecture and challenges of online prediction.
-
Taming the Data Mess, How Not to Be Overwhelmed by the Data Landscape
Ismaël Mejía reviews the current data landscape and discusses both technical and organizational ideas to avoid being overwhelmed by the current lack of consolidation of the data engineering world.
-
Data Versioning at Scale: Chaos and Chaos Management
Einat Orr discusses several technologies that version large data sets, the use cases they support and the technology developed to best support those use cases.
-
Modern Data Pipelines in AdTech—Life in the Trenches
Roksolana Diachuk discusses how to use modern data pipelines for reporting and analytics as well as the case of historical data reprocessing in AdTech.