Data Streaming Architecture with Apache Flink
Jamie Grier, Director of applications engineering at data Artisans recently spoke at OSCON 2016 Conference about data streaming architecture using Apache Flink. He talked about the building blocks of data streaming applications.
Data-streaming architectures are used to process data that's continuously produced as streams of events over time, instead of static datasets. Compared to the traditional centralized "state of the world" databases and data warehouses, data streaming applications work on the streams of events and on application-specific local state that is an aggregate of the history of events. Some of the advantages of streaming data processing are:
- Decreased latency from signal to decision
- Unified way of handling real-time and historic data
- Time travel queries.
Apache Flink is an open source platform for distributed stream and batch data processing. Flink was inspired by Google Data Flow model. It supports Stream Processing API in Java and Scala programming languages. Compared to other steaming data processing frameworks, there is no micro batching of data in Flink. It's based on "message at a time" stream processing.
Jamie talked about stateful stream processing and showed some code examples of Flink applications and monitoring using Influxdb, open source time series database and Graphana visualization tool.
He also talked about the Windowing concept in stream processing and Windowing in Processing Time v. Event Time. Windowing in processing time affects the streaming data analytics and results in some errors in data processing. In the Event Time approach, Windowing comes from data, not the clock time. With event time, data is processed according to a timestamp embedded in the data which allows you to compute more accurate results.
Jamie also discussed the failure handling and fault tolerance when using Flink in applications. Savepoints feature in Flink allows updating the programs and Flink cluster without losing any state. The savepoints data snapshots are important when you are doing stream processing on the real live data.
If you are interested in learning more about Apache Flink, check out their website. Also, Flink Forward 2016 Conference will be held in September in Berlin. The last date for submitting proposals is June 30, 2016.