KubeCon NA 2023: Kubernetes Storage Platform to Run Real-Time Analytic Databases

The Kubernetes storage platform provides a portable and flexible data management foundation to help developers build their own data solutions. Robert Hodges, CEO at Altinity, presented "Adventures in Data- Leaning on Kubernetes Storage to Run Hundreds of Real-Time Analytic Databases" at KubeCon + CloudNativeCon North America 2023 Conference in Chicago. He discussed different techniques his teams developed to build their data platform, like implementing compute and storage separation with Amazon's Elastic Block Store (EBS), extending persistence volumes without downtime, and changing storage parameters by creating a custom controller component.

Hodges began the presentation with an overview of an open source real-time analytics database called ClickHouse. This column-oriented database can run on bare metal or cloud infrastructure and is designed based on shared-nothing architecture. ClickHouse supports parallel and vectorized execution of queries and other distributed database capabilities like replication, sharding, and distributed query execution.

He talked about how they mapped the database to the Kubernetes (K8s) platform by writing the Altinity ClickHouse Operator. The ClickHouse Server runs as a Docker image, and the database components are represented in K8s resources using StatefulSets, which provides a nice abstraction for services. The data is stored in an AWS EBS Storage server. He also discussed a few challenges they experienced when running the database on Kubernetes.

Database replicas were run asymmetrically with different Availability Zones (AZ), resources, and software versions. There were also diverse data access requirements (R/W vs. R/O). They used a StatefulSet per database server to map resources, with a StatefulSet fronting the Pod and the Persistent Volume (PV). The team created different podTemplate values to divide pods by zone and a common volumeClaimTemplate to ensure all pods have the same storage spec. They can map servers precisely to VMs and storage.

He shared some storage performance statistics comparing Non-volatile Memory Express (NVMe) SSD (LINK) using i3.4xlarge VM server vs Cloud Block Storage that uses a m6i.4xlarge VM instance. For the cached query responses, EBS with m6i.4xlarge was universally faster in every query. It's mostly reading out of memory, and clock speed was faster with EBS storage. This makes the block storage solutions like EBS better options to use behind the database clusters. These performance tests were conducted by a benchmark testing tool called ClickBench.

Separation of storage and compute is another advantage of using Kubernetes for cloud-hosted databases. If you use EBS-like solution, you can scale compute and storage independently. He talked about K8s components like Node Selector for zones and adding anti-affinity, Taints and Tolerations to not contend with other services that are not ClickHouse. Another technique he discussed was to zero out stateful set replicas to shut off the compute. Dialling the replicas down by changing the "replicas" value to zero will shut down the pods but will keep the storage alive (i.e. StatefulSets and Persistent Volumes are still there). Other customizations include the Altinity EBS Params Controller used to apply custom annotation to alter many volumes at once and extend the storage in a live system to avoid a restart when extending block storage.

Hodges concluded the talk with lessons learned in using Kubernetes storage for cloud hosed analytic databases:

The K8s platform is a great option for running databases. Works portably across many environments, like different clouds, Minkube on a laptop, to dozens of clusters with 100s of nodes. Build on existing Kubernetes resources where possible.
Test performance carefully because the performance depends on database use cases and data access patterns.
Kubernetes and cloud block storage combination gives you separated storage and compute.
Use Kubernetes tricks like custom controllers to reach out to storage directly.

For additional information on KubeCon NA 2023, check out the conference website and the program schedule.

About the Author

Srini Penchikala

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

About the Author

Srini Penchikala

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter