Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News High Scalability Workflow Engine Zeebe is Production Ready

High Scalability Workflow Engine Zeebe is Production Ready

This item in japanese

Zeebe is a workflow engine from Camunda, designed to meet the scalability requirements of high-performance applications running on cloud-native software architectures and to support workflows that span multiple microservices in low latency, high-throughput scenarios. Zeebe can operate in an event-driven architecture (EDA) through its support for message events. Zeebe's new release, 0.20.0, has just been released as a free community edition and is considered production ready.

According to Michael Winters, product manager at Camunda, production ready means that the new version can handle orchestration use cases in a reliable way. The requirements for labelling the release as production ready include:

  • Support of business logic via BPMN elements for most microservices orchestration use cases
  • Can be scaled horizontally on a Kubernetes cluster
  • Is fault tolerant; no data is lost if a node goes down, and will recover when such a failure does occur
  • Works with cloud-native components such as Docker, Kubernetes, and Apache Kafka
  • Provides workflow data for monitoring, troubleshooting and auditing

Traditionally, workflow engines store current state of a workflow instance in a database. Zeebe instead uses event sourcing and stores all state changes as immutable events in an append-only event log. A projection of the current state of a workflow is stored as a snapshot using RocksDB. Both the log and snapshots are stored on disk, and this is currently the only option. Other options are discussed, but none are currently on the roadmap.

To achieve the fault tolerance, resilience and horizontal scalability required, Zeebe is built as a distributed system without any central component or database. Multiple Zeebe brokers can be setup in a peer-to-peer cluster which uses the Gossip protocol to route data within the cluster. For replication of the event log, the Raft consensus algorithm is used. To scale out, partitioning can be used with every partition using a separate event log.

Zeebe runs as a separate instance on a Java virtual machine (JVM), which is a remote engine approach with applications communicating with Zeebe over the network. To maintain performance, streaming into the client and a binary protocol (gRPC) are used. This approach creates a defined setup and environment for Zeebe, and provides isolation from application code,

Zeebe does not implement any ACID transaction protocol. To mitigate this and handle the failures that may occur when a task from within a workflow is executed, there are two options. By notifying Zeebe after the job is completed, at least once is achieved — Zeebe will be aware of a failure and execute the task again. By notifying Zeebe before the job is completed, at most once is achieved — Zeebe is unaware of any failure and the task will not be executed again. In an event-driven architecture, workflows can subscribe to external messages and Zeebe supports exactly-once when processing these messages.

Since Zeebe uses event sourcing, it cannot easily handle queries to find, for example workflow instances, with problems. Instead, Exporters and the concept of CQRS are used. An exporter can access the event stream and create projections of the data, and this technique is used by Operate, a tool that comes with Zeebe for monitoring and troubleshooting workflows.

Lately, there have been discussions about open source versus source-available licenses, where companies behind open source software restrict their licenses to prevent as-a-service providers from providing managed service offerings. Zeebe is distributed under The Zeebe Community License, a comparable source-available license, which does not allow for commercial offerings of a workflow service that uses Zeebe. The tool Operate is still in preview and comes with a developer license for free, non-production use. It is also possible to get enterprise support for both products.

The last developer preview release, 0.18.0, was feature complete, but the team behind Zeebe knows there will be a lot to learn from coming production deployments. To distinguish between production ready and finished, they therefore decided to call the new release 0.20.0, instead of 1.0.0.

In two blog posts, Bernd Rücker, co-founder and chief technologist of Camunda, describes the basics and main concepts of Zeebe and how they built a highly scalable distributed state machine.

To get help in understanding the main concepts of Zeebe and get a workflow up and running, a getting started tutorial is available.

Rate this Article