Jay Kreps on Distributed Stream Processing with Apache Kafka and Kafka Streams

| by Srini Penchikala Follow 22 Followers on Oct 16, 2016. Estimated reading time: 2 minutes |

A note to our readers: You asked so we have developed a set of features that allow you to reduce the noise: you can get email and web notifications for topics you are interested in. Learn more about our new features.

Apache Kafka and Kafka Streams frameworks help with developing stream-centric architectures and distributed stream processing applications. Jay Kreps, CEO of Confluent, gave the keynote presentation on stream processing and microservices at Reactive Summit 2016 Conference last week.

Kreps said there has been a lot of research on the database technologies but not much on message queues. Messaging can be used as the services backbone in microservices-based application architectures.

He talked about the three paradigms of programming: Request/Response, Batch, and Stream Processing, and the differences between these paradigms. A stream processing approach can be used for both online and batch use cases. It’s important to note that stream processing is not faster MapReduce; it’s a different paradigm of how data is processed and analyzed. Kreps discussed the four core APIs of Kafka-based stream processing: Producer, Consumer, Connector, and Streams.

Kafka Streams is a Java library for building fault-tolerant, distributed stream processing applications. It supports API methods like map, filter, aggregate (count, sum), and join.

In another keynote presentation at the conference, Peter Alvaro from UC Santa Cruz talked about automated failure testing large-scale distributed fault-tolerant systems. He talked about the Lineage-Driven Fault Injection (LDFI) approach which utilizes the tracing and logging information to identify redundant computations to help with testing.

Peter talked about the four requirements needed for failure testing of systems:

  • Real problems
  • Real systems
  • Time to think
  • Freedom To fail

For more information on this topic, you can check out the Netflix blog post on automated failure testing.

Other sessions at the second day of the conference included monolith to reactive microservices by Jan Machacek and Back-Pressure with Akka Streams and Kafka by Anil Gursel and Akara Sucharitakul.

Jan Machacek talked about a system developed with microservices using Akka, Scala and Kafka technologies. The system also uses Apache Cassandra for storage with RabbitMQ and with batch analytics code in Apache Spark. He suggested we need to have good monitoring and tracing capabilities when we are developing distributed systems. Each microservice may publish its internal API so the developers know how to consume the service.

Anil Gursel and Akara Sucharitakul spoke about how to handle high-burst workloads with Akka Streams and Kafka at PayPal organization using stream back-pressure capabilities. They discussed a sample web crawler use case and how they used buffering capabilities in Kafka and back-pressure with asynchronous processing in Akka Streams.

Akka Streams framework provides pure asynchronous stream processing and conforms to reactive streams. They also talked about Squbs, a reactive platform developed at PayPal, that provides capabilities like bootstrap, lifecycle management, loosely coupled module system, and integration hooks for logging and monitoring.


Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you