BT

Your opinion matters! Please fill in the InfoQ Survey!

Neha Narkhede: Large-Scale Stream Processing with Apache Kafka

| by Ralph Winzinger Follow 0 Followers on Jun 19, 2016. Estimated reading time: 2 minutes | NOTICE: The next QCon is in London Mar 5-19, 2018. Join us!

In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like for example machine learning.

Narkhede starts by introducing the basic programing paradigms for working with data:

  • Request / response cycles
  • Batch processing
  • Stream processing

and continues by giving a practical example for stream processing from the retail domain: sales and shipments are basically unbound datasets and working on such datasets is effectively stream processing. Sales and shipments are a stream of events ("what happened") and a function for recalculating prices ("do something"); based on those events is the stream processor.

Narkhede mentions the two most popular options that developers have today when thinking about stream processing. First, there is the do-it-yourself approach which might seem reasonable for simple scenarios but can get complex when capabilities like ordering, scalability, fault tolerance or processing past data are involved. Second, developers can pick solutions like Spark or Samza which are heavyweight and often designed for map reduce. But in Narkhede's opinion, stream processing looks like event-based microservices rather than map reduce and this is what Kafka Streams is designed for.

Kafka Streams is a lightweight library to be embedded in an application without making any restrictions on packaging or deployment. Narkhede continues by giving an overview on how important capabilities of stream processing systems are realized:

  • Scalability is automatically provided since the event log is partitioned. Thus, Kafka Stream based applications can form a cluster. Consumer libraries also assist in parallel data processing.
  • Fault tolerance is also provided out of the box. There is no master in a cluster of Kafka Stream nodes, just peers. Local state is more or less just a cache and if a node goes down, data processing is simply shifted to another node.
  • Stateful processing is also supported as needed by joins or windowed calculations. In such cases the necessary data is pushed to processor to avoid remote access.
  • Reprocessing of data with changed business logic is supported by letting new consumers start event processing with an offset of zero (from the beginning).

Narkhede then continuous to introduce the duality of Kafka Streams as the basic principle for implementing the given features: basically, the concept of tables ("state of the world") and streams ("how did the state evolve") are combined. Therefore, Kafka Streams based applications can be reactive and stateful at the same time. Having both concepts present eventually also leads to simplified architectures.

Neha Narkhede concludes by providing a brief glimpse on Kafka Connect, which is a core component to get data into and out of Kafka by connecting systems like various databases, Hadoop or Elasticsearch.

Please note that most QCon presentations will be made available for free on InfoQ in the weeks after the conference and slides are available for download on the conference web site.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT