InfoQ Homepage Apache Kafka Content on InfoQ
-
Uber Drives Apache Kafka's Tiered Storage Feature; Sparks Efficiency Debate
Apache Kafka, the popular distributed event streaming platform, has introduced a new tiered storage feature in version 3.6.0, initially proposed by Uber engineers. This feature, currently in early access, aims to address the scalability and efficiency challenges faced by organizations running large Kafka clusters.
-
Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day
Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.
-
Java News Roundup: Hazelcast 5.5, Projects Loom and Valhalla, Hibernate ORM and Validation
This week's Java roundup for July 29th, 2024, features news highlighting: the release of Hazelcast 5.5; early-access releases for Project Loom and Project Valhalla; beta releases of Hibernate ORM 7.0 and Hibernate Validation 9.0; and point releases for Quarkus, Helidon, GlassFish, JobRunr and Testcontainers for Java.
-
Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill
The Apache Kafka community is actively working on enabling queue-like use cases for a popular messaging platform as part of the ongoing KIP-932 (Kafka Improvement Proposal). The proposal introduces a share group abstraction for cooperative message consumption. Meanwhile, SoftwareMill created an alternative solution that can work with the existing consumer group abstraction.
-
Allegro Reduces Kafka Producer Latency Outliers by 82% after Switching to XFS
Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.
-
Yelp Overhauls Its Streaming Architecture with Apache Beam and Apache Flink
Yelp reworked its data streaming architecture by employing Apache Beam and Apache Flink. The company replaced a fragmented set of data pipelines for streaming transactional data into its analytical systems, like Amazon Redshift and in-house data lake, using Apache data streaming projects to create a unified and flexible solution.
-
QCon London: Lessons Learned from Building LinkedIn’s AI/ML Data Platform
At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. He specifically delved into Venice DB, the NoSQL data store used for feature persistence. The presenter shared the lessons learned from evolving and operating the platform, including cluster management and library versioning.
-
CNCF Incubates Strimzi to Simplify Kafka on Kubernetes
The Cloud-Native Computing Foundation (CNCF) has approved Strimzi as an incubating project to streamline the deployment of Apache Kafka on Kubernetes. Strimzi provides a Kubernetes-native way to interact with Kafka through a set of operators that extend the Kubernetes API making it easier to configure, deploy, and operate Kafka on Kubernetes.
-
Uber Builds Scalable Chat Using Microservices with GraphQL Subscriptions and Kafka
Uber replaced a legacy architecture built using the WAMP protocol with a new solution that takes advantage of GraphQL subscriptions. The main drivers for creating a new architecture were challenges around reliability, scalability, observability/debugibility, as well as technical debt impeding the team’s ability to maintain the existing solution.
-
Java News Roundup: New OpenJDK JEPs, Spring Functions Catalog, Apache Kafka, Quarkus, JReleaser
This week's Java roundup for February 26th, 2024, features news highlighting: JEP 468, Derived Record Creation (Preview); JEP 467, Markdown Documentation Comments; a new Spring Functions Catalog; end-of-life planned for the Spring Framework 6.0 and 5.3 release trains; and point releases for Apache Kafka, Quarkus and JReleaser.
-
Grab Improves Kafka on Kubernetes Fault Tolerance with Strimzi, AWS AddOns and EBS
Grab updated its Kafka on Kubernetes setup to improve fault tolerance and completely eliminate human intervention in case of unexpected Kafka broker terminations. To address the shortcomings of the initial design, the team integrated with AWS Node Termination Handler (NTH), used the Load Balancer Controller for target group mapping, and switched to ELB volumes for storage.
-
.NET Aspire Preview 3: Expanded Component Support with Azure OpenAI, MySQL, CosmosDB, Kafka and More
Last week, Microsoft revealed the availability of the .NET Aspire - third preview. Preview 3 brings changes including UI improvements to the dashboard, and new component support for Azure OpenAI, Kafka, Oracle, MySQL, CosmosDB & Orleans, and many more.
-
Pinterest Open-Sources a Production-Ready PubSub Java Client for Kafka, Flink, and MemQ
Pinterest open-sourced its generic PubSub client library, PSC, which has been heavily used in production for a year and a half. The library helped the engineering teams by increasing developer velocity, and the scalability and stability of services using it. Over 90% of Java applications have migrated to PSC with minimal changes.
-
Instacart Creates Real-Time Item Availability Architecture with ML and Event Processing
Instacart combined machine learning with event-based processing to create an architecture that provides customers with an indication of item availability in near real-time. The new solution helped to improve user satisfaction and retention by reducing order cancellations due to out-of-stock items. The team also created a multi-model experimentation framework to help enhance model quality.
-
Zendesk Moves from DynamoDB to MySQL and S3 to Save over 80% in Costs
Zendesk reduced its data storage costs by over 80% by migrating from DynamoDB to a tiered storage solution using MySQL and S3. The company considered different storage technologies and decided to combine the relational database and the object store to strike a balance between querybility and scalability while keeping the costs down.