InfoQ Homepage Event Stream Processing Content on InfoQ
-
Inside Atlassian’s Forge Billing Architecture for Distributed Usage Tracking at Scale
Atlassian details the Forge billing platform built for usage-based pricing across its cloud ecosystem. It processes large-scale usage events with correct attribution, deduplication, and aggregation using a streaming pipeline, idempotent processing, and layered storage to enable accurate billing, near real-time visibility, and reliable reconciliation across distributed services.
-
30+ Updates per Second per Account: Uber Scales Ledger Processing with Batching
Uber introduced a high-throughput financial ledger processing system designed to handle hot account write contention at scale. Using 250ms batching, Redis coordination, and optimistic atomic updates, the system supports 30+ updates per second per account while preserving consistency and auditability, reducing multi-hour processing pipelines to minutes in its distributed accounting infrastructure.
-
Confluent Moves Schema IDs to Kafka Headers to Simplify Schema Governance
Confluent introduces a new approach in Apache Kafka that moves schema IDs from message payloads to record headers, aiming to simplify schema governance and evolution. The update integrates with Schema Registry, improves compatibility across serialization formats, and reduces coupling between data and metadata in event-driven architectures.
-
Azure Event Hubs Geo-Replication Reaches General Availability
Microsoft has launched the General Availability of Geo-replication for Azure Event Hubs, enhancing data availability and redundancy. This feature allows seamless cross-region data replication, ensuring business continuity during outages. With synchronous and asynchronous options, users can choose their preferred data consistency, backed by increased health metrics for better monitoring.
-
Databricks Contributes Spark Declarative Pipelines to Apache Spark
At the Databricks Data+AI Summit, held in San Francisco, USA, from June 10 to 12, Databricks announced that it is contributing the technology behind Delta Live Tables (DLT) to the Apache Spark project, where it will be called Spark Declarative Pipelines. This move will make it easier for Spark users to develop and maintain streaming pipelines, and furthers Databrick’s commitment to open source.
-
Uber Drives Apache Kafka's Tiered Storage Feature; Sparks Efficiency Debate
Apache Kafka, the popular distributed event streaming platform, has introduced a new tiered storage feature in version 3.6.0, initially proposed by Uber engineers. This feature, currently in early access, aims to address the scalability and efficiency challenges faced by organizations running large Kafka clusters.
-
Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day
Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.
-
Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill
The Apache Kafka community is actively working on enabling queue-like use cases for a popular messaging platform as part of the ongoing KIP-932 (Kafka Improvement Proposal). The proposal introduces a share group abstraction for cooperative message consumption. Meanwhile, SoftwareMill created an alternative solution that can work with the existing consumer group abstraction.
-
Local Emulator for Azure Event Hubs in Preview: Offering Developers a Local Development Experience
Microsoft recently launched the local emulator's preview release for Azure Event Hubs. This emulator is designed to give developers a local development experience for Azure Event Hubs, allowing them to develop and test code against the services in isolation.
-
Yelp Overhauls Its Streaming Architecture with Apache Beam and Apache Flink
Yelp reworked its data streaming architecture by employing Apache Beam and Apache Flink. The company replaced a fragmented set of data pipelines for streaming transactional data into its analytical systems, like Amazon Redshift and in-house data lake, using Apache data streaming projects to create a unified and flexible solution.
-
Real-Time Data Streaming Capabilities with AppSync Integration in Amazon EventBridge Event Bus
AWS recently announced that Amazon EventBridge Event Bus supports AWS AppSync as an Event Bus's target, enabling developers to stream real-time updates such as sports scores from their applications to frontend applications, including mobile and desktop.
-
DoorDash Develops New Sessionization Platform with Flink to Improve Notification Delivery Timeliness
DoorDash has significantly enhanced its user engagement by leveraging Apache Flink for real-time session detection and notification delivery. This move marks a substantial advancement in user interaction and cart conversion rates.
-
Expedia Uses WebSockets and Kafka to Query Near Real-Time Streaming Data
Expedia created a solution to support querying the clickstream data from their platform in near-real time to enable their product and engineering teams to explore live data while working on new and enhancing existing data-driven functional use cases. The team used a combination of WebSockets, Apache Kafka, and PostgreSQL to allow streaming query results continuously to users’ browsers.
-
How HubSpot Uses Apache Kafka Swimlanes for Timely Processing of Workflow Actions
HubSpot adopted routing messages over multiple Kafka topics (called swimlanes) for the same producer to avoid the build-up in the consumer group lag and prioritize the processing of real-time traffic. Using a combination of automatic and manual detection of traffic spikes, the company ensures the majority of customers’ workflows execute without delays.
-
Goldsky’s Streaming-First Architecture for Blockchain Data with Flink, Redpanda and Kubernetes
Goldsky created a platform for the real-time processing of blockchain data. The platform allows clients to extract data from blockchains into their own databases to support product features, but without running the data pipeline infrastructure. The event-driven architecture (EDA) of Goldsky leverages Apache Flink, Redpanda, Kubernetes, and cloud provider services.