Microservices and Stream Processing Architecture at Zalando Using Apache Flink
Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference earlier this month about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases.
Zalando is an online fashion retailer in Europe which is transitioning from a monolithic to a microservices architecture and from a hierarchical to an agile organization.
In their architecture, applications communicate with each other using REST APIs and the databases are hidden behind the Amazon Virtual Private Cloud (VPC) infrastructure. All teams publish the data to a central event bus. The architecture model consists of the application calling microservices (REST API) which interact with the event bus and then Kafka, Exporter and finally AWS S3 data store. They also use a Data Lake that provides the distributed access and fine grained security to data.
Lopez and Vieru discussed how to use Flink framework in a microservices architecture. Flink is used to process the stream data based on event, ingestion, and process times. It also takes care of back pressure handling implicitly through system architecture.
Their business process includes 1000 plus event types and they have a Kafka topic for each event type. They analyze processes with correlated event types (like Join & Union) and enrich the data based on business rules. Steam processing is done using sliding windows (1 minute to 48 hours) for platform snapshots.
The architecture also includes OAuth for security, Postgres DB based configuration service, alert service, and visualization components using Python.
The speakers discussed two uses cases of stream processing: near real-time business process monitoring solution, and streaming ETL. Real-time process monitoring helps analyze data streams like order velocities, delivery velocities and to control service level agreements (SLAs). Streaming ETL is used to relinquish resources from the relational data warehouse. This solution helps with higher loads on data warehouse, reduces latency and makes the platform more scalable.
They also talked about the future use cases for stream processing like near-real time sales and price monitoring and fraud detection of payments. Complex event processing for BPM and Flink’s CEP library and state capabilities will be used for these use cases.
For more details about their event stream processing architecture, checkout the company's blog website.