Nenad Bogojevic, platform solutions architect at Amadeus, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference on how to run and manage Kafka clusters in Kubernetes environment.
They use Kafka for log and events collection as well as a streaming platform. Each broker in the Kafka cluster has an identity which can be used to find other brokers in the cluster. The brokers also need some type of a database to store partition logs. It's important to configure a Persistent Volume (PV) for Kafka, otherwise you will lose the logs.
He talked about provisioning Kafka clusters and configuring them using Kubernetes ConfigMap or CustomResource describing parameters like name, partition count, replication factor and topic properties like retention time in milliseconds. This helps with automating the provisioning and unprovisioning of Topics. It also ensures consistent configuration across development and operations stages as well as for cluster restarts.
Bogojevic discussed how to set up the Kafka and ZooKeeper cluster elements using Kubernetes' StatefulSet feature. This provides the following capabilities:
- Stable pod identity
- Stable storage
- Ordered startup, shutdown
- Rolling updates
Their solution architecture includes Kafka and ZooKeeper Statefulset which runs as a headless service. There is also a discovery service used by the client applications to find the Kafka nodes in the cluster. He talked about node selectors which can be used to land instances on machines with good hardware (e.g. SSD) and anti-affinity to spread instances across different physical machines.
Monitoring of Kubernetes is another important component of this architecture and can be used to check that a server is ready and it can accept connections. Monitoring is done using JMX and Prometheus tools.
Bogojevic discussed the Kafka operator which can be used for transposing domain knowledge of SRE/Operation teams into executable code. They use the operators for different components like the following:
- Prometheus
- Redis cluster
- Workflow
- Kafka
It's a good practice to create Kafka Topics using automated scripts. Messaging solutions should consider following a "Topics as a code" approach. Other best practices when operating Topics in Kafka cluster include the following:
- Make sure that topic exists in target environments
- Make sure that topic is deleted once it is no longer used
- Propagate same configuration across environments
- Configure retention based on available disk space
- Configure clients with credentials
- Deliver configuration and requirements as code
Bogojevic concluded the presentation by talking about best practices when you need to perform Kafka upgrades which include: setting protocol version to current, upgrading Kafka brokers one at a time, and then setting the protocol version to new. Regarding the storage format, have consumers on an up-to date version and update the format version to new.
The video for Bogojevic's talk "Kafka Operator: Managing and Operating Kafka Clusters in Kubernetes" can be found on the CNCF YouTube channel.