Data Streaming Architecture with Apache Flink

| by Srini Penchikala Follow 37 Followers on Jun 09, 2016. Estimated reading time: 1 minute |

Jamie Grier, Director of applications engineering at data Artisans recently spoke at OSCON 2016 Conference about data streaming architecture using Apache Flink. He talked about the building blocks of data streaming applications.

Data-streaming architectures are used to process data that's continuously produced as streams of events over time, instead of static datasets. Compared to the traditional centralized "state of the world" databases and data warehouses, data streaming applications work on the streams of events and on application-specific local state that is an aggregate of the history of events. Some of the advantages of streaming data processing are:

  • Decreased latency from signal to decision
  • Unified way of handling real-time and historic data
  • Time travel queries.

Apache Flink is an open source platform for distributed stream and batch data processing. Flink was inspired by Google Data Flow model. It supports Stream Processing API in Java and Scala programming languages. Compared to other steaming data processing frameworks, there is no micro batching of data in Flink. It's based on "message at a time" stream processing.

Jamie talked about stateful stream processing and showed some code examples of Flink applications and monitoring using Influxdb, open source time series database and Graphana visualization tool.

He also talked about the Windowing concept in stream processing and Windowing in Processing Time v. Event Time. Windowing in processing time affects the streaming data analytics and results in some errors in data processing. In the Event Time approach, Windowing comes from data, not the clock time. With event time, data is processed according to a timestamp embedded in the data which allows you to compute more accurate results.

Jamie also discussed the failure handling and fault tolerance when using Flink in applications. Savepoints feature in Flink allows updating the programs and Flink cluster without losing any state. The savepoints data snapshots are important when you are doing stream processing on the real live data.

If you are interested in learning more about Apache Flink, check out their website. Also, Flink Forward 2016 Conference will be held in September in Berlin. The last date for submitting proposals is June 30, 2016.





Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread


Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you