InfoQ Homepage Apache Kafka Content on InfoQ
-
Datadog Creates Scalable Data Ingestion Architecture
Datadog created a dedicated data ingestion architecture offering exactly-once semantics for their third-generation event store, Husky. The event-driven architecture (EDA) can accommodate bursts in traffic in the multi-tenant platform with reasonable ingestion latency and acceptable operational costs.
-
Tales of Kafka at Cloudflare: Andrea Medda and Matt Boyle at QCon London
At QCon London, Andrea Medda, senior systems engineer at Cloudflare, and Matt Boyle, engineering manager at Cloudflare, shared the lessons their platform services team learned from enabling the use of Apache Kafka at the scale of 1 trillion messages.
-
Spring for Apache Kafka 3.0 and Spring for RabbitMQ 3.0 Released
VMWare has released Spring for Apache Kafka 3.0 and Spring for RabbitMQ 3.0 requiring Java 17 and Spring Framework 6.0. The projects now support the creation of native GraalVM applications and observation for timers and tracing by using the Micrometer metrics facade. Both projects now provide a Bill of Materials (BOM) in the pom.xml file to assist with dependency management.
-
Uber Freight Near-Real-Time Analytics Architecture
Uber Freight is the Uber platform dedicated to connecting shippers with carriers. Providing reliable service to shippers is crucial for Uber Freight. This is why the Carrier Scorecard was developed, with several metrics including on-time pickup/delivery, tracking automation, and late cancellations.
-
Apache Kafka 3.3 Replaces ZooKeeper with the New KRaft Consensus Protocol
The Apache Software Foundation has released Apache Kafka 3.3.1 with many new features and improvements. In particular, this is the first release that marks KRaft (Kafka Raft) consensus protocol as production ready. In development for several years, it was released in early access in Kafka 2.8, then in preview in Kafka 3.0.
-
AWS Lambda Supports Event Filtering for Amazon MSK, Kafka and Amazon MQ
Amazon recently announced that AWS Lambda supports content filtering options for Amazon MSK, Self-Managed Kafka, Amazon MQ for Apache ActiveMQ, and Amazon MQ for RabbitMQ as event sources. The new options extend the filtering to data store and broker services and reduce traffic to Lambda functions, simplifying application logic and reducing costs.
-
Netflix Builds a Custom High-Throughput Priority Queue Backed by Redis, Kafka and Elasticsearch
Netflix recently published how it built Timestone, a custom high-throughput, low-latency priority queueing system. They built it using open-source components such as Redis, Apache Kafka, Apache Flink and Elasticsearch. Engineers state that they made Timestone since they could not find an off-the-shelf solution that met all of its requirements.
-
Grab Shared Its Experience in Designing Distributed Data Platform
GrabApp is an application that customers select and buy their daily needs from merchants. To be scalable and manageable the data platform and ingestion should be designed as a distributed, fault-tolerant. To design this data platform two classes of data stores are considered: OLTP and OLAP.
-
Confluent Introduces Stream Governance Advanced to Safely Extend Data Streaming Power
Confluent recently announced new enhancements to its Stream Governance product that will improve engineering teams' ability to discover, understand, and trust real-time data. Organizations can use Stream Governance Advanced to resolve issues within complex pipelines more easily with point-in-time lineage.
-
Confluent Ships Stream Designer Democratizing Data Streams
Confluent recently released Stream Designer, a visual interface that lets developers quickly build and deploy streaming data pipelines.
-
Next Generation of Data Movement and Processing Platform at Netflix
Netflix engineering recently published in a tech blog how they used data mesh architecture and principles as the next generation of data platform and processing to unleash more business use cases and opportunities. Data mesh is the new paradigm shift in data management that enables users to easily import and use data without transporting it to a centralized location like a data lake.
-
Fitting Presto to Large-Scale Apache Kafka at Uber
The need for ad-hoc real-time data analysis has been growing at Uber. They run a large Apache Kafka deployment and need to analyse data going through the many workflows it supports. Solutions like stream processing and OLAP datastores were deemed unsuitable. An article was published recently detailing why Uber chose Presto for this purpose and what it had to do to make it performant at scale.
-
Amazon MSK Serverless Now Generally Available
AWS recently announced that Amazon MSK Serverless is now generally available. The serverless option to manage an Apache Kafka cluster removes the need to monitor capacity and automatically balances partitions within a cluster.
-
Netflix Studio Search: Using Elasticsearch and Apache Flink to Index Federated GraphQL Data
Netflix engineers recently published how they built Studio Search, using Apache Kafka streams, an Apache Flink-based Data Mesh process, and Elasticsearch to manage the index. They designed the platform to take a portion of Netflix's federated GraphQL graph and make it searchable. Today, Studio Search powers a significant portion of the user experience for many applications within the organisation.
-
Kestra: a Scalable Open-Source Orchestration and Scheduling Platform
Kestra, a new open-source orchestration and scheduling platform, helps developers to build, run, schedule, and monitor complex pipelines. The concept of a workflow, called Flow in Kestra, is at the heart of the platform. It is a list of tasks defined with a descriptive language based on yaml.