InfoQ Homepage Streaming Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

From Batch to Micro-Batch Streaming: Lessons Learned the Hard Way in a Delta Index Pipeline

This article describes how a production delta-index pipeline migrated from scheduled batch to micro-batch Spark Structured Streaming. It covers why record-level streaming was rejected, how partition-based watermarks replaced fragile S3 completion markers, overlap-window correctness, and restart-as-design strategies for better predictability in object-store–based ingestion systems.

Parveen Saini
on May 04, 2026
Cloud

Building Streaming Infrastructure That Scales: Because Viewers Won't Wait until Tomorrow

In streaming, the challenge is immediate: customers are watching TV right now, not planning to watch it tomorrow. When systems fail during prime time, there is no recovery window; viewers leave and may not return. One and a half years ago, at ProSiebenSat.1 Media SE, we faced the challenge of scaling streaming applications for international users.

Daniele Frasca
on Dec 23, 2025
DevOps

Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies

Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.

Srikanth Daggumalli Nishchai Jayanna Manjula
on Jun 20, 2025
Architecture & Design

Tales of Kafka at Cloudflare: Lessons Learnt on the Way to 1 Trillion Messages

Cloudflare uses Kafka clusters to decouple microservices and communicate the creation, change or deletion of various resources via protobuf, a common data format in a fault-tolerant manner. The authors suggest investing in metrics for problem detection, prioritizing clear SDK documentation, and balancing flexibility and simplicity for standardized pipelines.

Matt Boyle Andrea Medda
on May 29, 2023
Java

Billions of Messages Per Minute Over TCP/IP

Chronicle Wire offers an alternative way of transferring data between systems, delivering more messages, faster, than common JSON/XML approaches. This approach to data serialization improves both latency and throughput.

George Ball
on Mar 24, 2023
Architecture & Design

Building & Operating High-Fidelity Data Streams

At QCon Plus 2021 last November, Sid Anand, chief architect at Datazoom and PMC Member at Apache Airflow, presented on building high-fidelity nearline data streams as a service within a lean team. In this talk, Anand provides a master class on building high-fidelity data streams from the ground up.

Sid Anand
on Sep 30, 2022
AI, ML & Data Engineering

Streaming-First Infrastructure for Real-Time Machine Learning

This article covers the benefits of streaming-first infrastructure for two scenarios of real-time ML: online prediction, where a model can receive a request and make predictions as soon as the request arrives, and continual learning, when machine learning models are capable of continually adapting to change in data distributions in production.

Chip Huyen
on Aug 22, 2022
Cloud

Designing IoT Solutions with Microsoft Azure

In this article, we will learn how the IoT solutions can work with Microsoft Azure and what services are available to perform different operations across multiple domains. Furthermore, it covers a few case studies to gain hands-on experience on Azure IoT that are common and provide a good starting point for utilizing cloud-based IoT services.

Kush Mishra
on Mar 02, 2022
Development

How to Create a Network Proxy Using Stream Processor Pipy

In this article we are going to introduce Pipy, an open-source cloud-native network stream processor. After describing its modular design, we will see how to rapidly build a high-performance network proxy to serve our specific needs. Pipy has been battle-tested and is already in use by multiple commercial clients.

Ali Naqvi
on Jan 31, 2022
Cloud

Indestructible Storage in the Cloud with Apache Bookkeeper

At Salesforce, we required a storage system that could work with two kinds of streams, one stream for write-ahead logs and one for data. But we have competing requirements from both of the streams. Being the pioneers in cloud computing, we also required our storage system to be cloud-aware as the requirements of availability and durability are ever more increasing.

Anup Ghatage
on Apr 28, 2021
AI, ML & Data Engineering

The Future of Data Engineering

Chris Riccomini examines the current and future states of the art in data pipelines, data streaming, and data warehousing. He presents a six-stage evolution that data ecosystems follow, from a simple monolith to a complex data-microwarehouse architecture as the data engineers who manage them solve problems and clarify their roles as infrastructure engineers, rather than data stewards.

Chris Riccomini
on Feb 16, 2021
AI, ML & Data Engineering

How Apache Pulsar is Helping Iterable Scale its Customer Engagement Platform

In this article, author Greg Methvin discusses his experience implementing a distributed messaging platform based on Apache Pulsar.

Greg Methvin
on Nov 30, 2020

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles