Jay Kreps on Distributed Stream Processing with Apache Kafka and Kafka Streams

Apache Kafka and Kafka Streams frameworks help with developing stream-centric architectures and distributed stream processing applications. Jay Kreps, CEO of Confluent, gave the keynote presentation on stream processing and microservices at Reactive Summit 2016 Conference last week.

Kreps said there has been a lot of research on the database technologies but not much on message queues. Messaging can be used as the services backbone in microservices-based application architectures.

He talked about the three paradigms of programming: Request/Response, Batch, and Stream Processing, and the differences between these paradigms. A stream processing approach can be used for both online and batch use cases. It’s important to note that stream processing is not faster MapReduce; it’s a different paradigm of how data is processed and analyzed. Kreps discussed the four core APIs of Kafka-based stream processing: Producer, Consumer, Connector, and Streams.

Kafka Streams is a Java library for building fault-tolerant, distributed stream processing applications. It supports API methods like map, filter, aggregate (count, sum), and join.

In another keynote presentation at the conference, Peter Alvaro from UC Santa Cruz talked about automated failure testing large-scale distributed fault-tolerant systems. He talked about the Lineage-Driven Fault Injection (LDFI) approach which utilizes the tracing and logging information to identify redundant computations to help with testing.

Peter talked about the four requirements needed for failure testing of systems:

Real problems
Real systems
Time to think
Freedom To fail

For more information on this topic, you can check out the Netflix blog post on automated failure testing.

Other sessions at the second day of the conference included monolith to reactive microservices by Jan Machacek and Back-Pressure with Akka Streams and Kafka by Anil Gursel and Akara Sucharitakul.

Jan Machacek talked about a system developed with microservices using Akka, Scala and Kafka technologies. The system also uses Apache Cassandra for storage with RabbitMQ and with batch analytics code in Apache Spark. He suggested we need to have good monitoring and tracing capabilities when we are developing distributed systems. Each microservice may publish its internal API so the developers know how to consume the service.

Anil Gursel and Akara Sucharitakul spoke about how to handle high-burst workloads with Akka Streams and Kafka at PayPal organization using stream back-pressure capabilities. They discussed a sample web crawler use case and how they used buffering capabilities in Kafka and back-pressure with asynchronous processing in Akka Streams.

Akka Streams framework provides pure asynchronous stream processing and conforms to reactive streams. They also talked about Squbs, a reactive platform developed at PayPal, that provides capabilities like bootstrap, lifecycle management, loosely coupled module system, and integration hooks for logging and monitoring.

InfoQ Software Architects' Newsletter

Follow us on

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter