InfoQ Homepage Apache Kafka Content on InfoQ
-
Inside Atlassian Lithium: How a Dynamic ETL Platform is Transforming Data Movement and Cutting Costs
Atlassian recently introduced Lithium, an in-house ETL platform designed to meet the requirements of dynamic data movement. Lithium streamlines tasks such as cloud migrations, scheduled backups, and in-flight data validations by supporting ephemeral pipelines and tenant-level isolation while ensuring efficiency and scalability, resulting in significant cost savings.
-
Stream All the Things: Patterns of Effective Data Stream Processing Explored by Adi Polak at QCon SF
Adi Polak, Director of Advocacy and Developer Experience Engineering at Confluent, illuminated the complexities of data streaming in her QCon San Francisco presentation. She outlined key design patterns for robust pipelines, emphasizing reliability, scalability, and data integrity.
-
Java News Roundup: Spring Cloud, Project Loom, Open Liberty, Groovy, Jakarta EE 11 Update
This week's Java roundup for November 4th, 2024, features news highlighting: the first candidate release of Spring Cloud 2024; an update on Project Loom; the release of Open Liberty 24.0.0.11; point and milestone releases for Apache Groovy; and an update on Jakarta EE 11.
-
Java News Roundup: Jakarta EE 11, GlassFish 8.0-M8, JReleaser 1.15, JHipster 8.7.3, Quarkus 3.16
This week's Java roundup for October 28th, 2024 features news highlighting: an update to the upcoming release of Jakarta EE; the eighth milestone release of GlassFish 8.0; and point releases of JReleaser 1.15.0, JHipster 8.7.3 and Quarkus 3.16.0.
-
Uber Drives Apache Kafka's Tiered Storage Feature; Sparks Efficiency Debate
Apache Kafka, the popular distributed event streaming platform, has introduced a new tiered storage feature in version 3.6.0, initially proposed by Uber engineers. This feature, currently in early access, aims to address the scalability and efficiency challenges faced by organizations running large Kafka clusters.
-
Canva Opts for Amazon KDS over SNS+SQS to Save 85% with 25 Billion Events per Day
Canva evaluated different data massaging solutions for its Product Analytics Platform, including the combination of AWS SNS and SQS, MKS, and Amazon KDS, and eventually chose the latter, primarily based on its much lower costs. The company compared many aspects of these solutions, like performance, maintenance effort, and cost.
-
Java News Roundup: Hazelcast 5.5, Projects Loom and Valhalla, Hibernate ORM and Validation
This week's Java roundup for July 29th, 2024, features news highlighting: the release of Hazelcast 5.5; early-access releases for Project Loom and Project Valhalla; beta releases of Hibernate ORM 7.0 and Hibernate Validation 9.0; and point releases for Quarkus, Helidon, GlassFish, JobRunr and Testcontainers for Java.
-
Queue Support for Apache Kafka: KIP-932 and KMQ from SoftwareMill
The Apache Kafka community is actively working on enabling queue-like use cases for a popular messaging platform as part of the ongoing KIP-932 (Kafka Improvement Proposal). The proposal introduces a share group abstraction for cooperative message consumption. Meanwhile, SoftwareMill created an alternative solution that can work with the existing consumer group abstraction.
-
Allegro Reduces Kafka Producer Latency Outliers by 82% after Switching to XFS
Allegro experimented with different performance optimization options to improve Apache Kafka producer tail latency and eventually switched all its clusters to the XFS filesystem. The company used Kafka protocol sniffing, JVM profiling, and eBPF, which proved instrumental in identifying and eliminating performance bottlenecks.
-
Yelp Overhauls Its Streaming Architecture with Apache Beam and Apache Flink
Yelp reworked its data streaming architecture by employing Apache Beam and Apache Flink. The company replaced a fragmented set of data pipelines for streaming transactional data into its analytical systems, like Amazon Redshift and in-house data lake, using Apache data streaming projects to create a unified and flexible solution.
-
QCon London: Lessons Learned from Building LinkedIn’s AI/ML Data Platform
At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform powering the company’s products. He specifically delved into Venice DB, the NoSQL data store used for feature persistence. The presenter shared the lessons learned from evolving and operating the platform, including cluster management and library versioning.
-
CNCF Incubates Strimzi to Simplify Kafka on Kubernetes
The Cloud-Native Computing Foundation (CNCF) has approved Strimzi as an incubating project to streamline the deployment of Apache Kafka on Kubernetes. Strimzi provides a Kubernetes-native way to interact with Kafka through a set of operators that extend the Kubernetes API making it easier to configure, deploy, and operate Kafka on Kubernetes.
-
Uber Builds Scalable Chat Using Microservices with GraphQL Subscriptions and Kafka
Uber replaced a legacy architecture built using the WAMP protocol with a new solution that takes advantage of GraphQL subscriptions. The main drivers for creating a new architecture were challenges around reliability, scalability, observability/debugibility, as well as technical debt impeding the team’s ability to maintain the existing solution.
-
Java News Roundup: New OpenJDK JEPs, Spring Functions Catalog, Apache Kafka, Quarkus, JReleaser
This week's Java roundup for February 26th, 2024, features news highlighting: JEP 468, Derived Record Creation (Preview); JEP 467, Markdown Documentation Comments; a new Spring Functions Catalog; end-of-life planned for the Spring Framework 6.0 and 5.3 release trains; and point releases for Apache Kafka, Quarkus and JReleaser.
-
Grab Improves Kafka on Kubernetes Fault Tolerance with Strimzi, AWS AddOns and EBS
Grab updated its Kafka on Kubernetes setup to improve fault tolerance and completely eliminate human intervention in case of unexpected Kafka broker terminations. To address the shortcomings of the initial design, the team integrated with AWS Node Termination Handler (NTH), used the Load Balancer Controller for target group mapping, and switched to ELB volumes for storage.