BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Grafana's Kubernetes Monitoring Helm Chart v4 Brings Multiple Fixes

Grafana's Kubernetes Monitoring Helm Chart v4 Brings Multiple Fixes

Listen to this article -  0:00

Grafana Labs has released version 4 of its Kubernetes Monitoring Helm chart, describing it as the most significant update the chart has received since its introduction. The release, announced in April 2026 by Pete Wall and Beverly Buchanan, addresses a range of configuration problems that had accumulated as users scaled to larger and more complex deployments.

The Kubernetes Monitoring Helm chart provides a mechanism for sending metrics, logs, traces, and profiles from Kubernetes clusters to Grafana Cloud or a self-hosted Grafana stack. According to Wall and Buchanan, version 4 represents nearly six months of planning and development, and is designed to solve real pain points that users have encountered as their monitoring setups have grown. The authors write that the chart is now "more predictable, more flexible, and much easier to maintain, whether you manage one cluster or a hundred."

"Representing nearly six months of planning and development, it's designed to solve real pain points that users have hit as their monitoring setups have grown."
- Pete Wall and Beverly Buchanan

One of the more consequential structural changes is the conversion of destinations from a list to a map. In version 3, destinations were defined as a list of objects. This caused problems for teams managing multiple clusters with shared configuration files, and for those using GitOps tools such as Argo CD, Terraform, or Flux. Overriding a single property, such as a password, required referencing the destination by its position in the list. If the order of destinations changed, those overrides would silently apply to the wrong target. In version 4, each destination has a stable name, so destinations.prometheus.auth.password always refers to the Prometheus destination regardless of ordering. Helm's ability to merge map-based configurations across files also makes multi-cluster GitOps workflows more reliable.

Collectors have also had a similar restructuring. Version 3 shipped with hard-coded collector names such as alloy-metrics, alloy-logs, and alloy-singleton, each tied to a specific deployment type. The routing of features to collectors was buried in the chart's internal code, meaning that understanding which feature ran on which collector required reading source code rather than configuration files. Version 4 removes these hard-coded names entirely. Users now define collectors as a map and assign one or more presets that describe the deployment shape, such as clustered, statefulset, or daemonset. Features are then explicitly assigned to a named collector, removing the hidden routing logic from the chart internals.

"If you forget to specify, it will give you a message telling you which feature still needs to be assigned to a collector rather than silently picking one for you."
- Pete Wall and Beverly Buchanan

The release also separates the deployment of backing services from the features that consume their data. In version 3, enabling a feature such as clusterMetrics would silently deploy services like Node Exporter, kube-state-metrics, and OpenCost behind the scenes. This caused problems for teams whose clusters already ran these services, as duplicate deployments would appear without warning. Version 4 introduces a telemetryServices key that makes service deployment an explicit step. Teams that already have Node Exporter running can instruct the chart to skip deployment and point the feature at the existing instance instead. As Wall and Buchanan note, the approach means "no more surprise deployments."

The handling of cluster metrics has been reorganised into three separate features. The version 3 clusterMetrics feature covered Kubernetes cluster metrics, Linux and Windows host metrics, energy metrics via Kepler, and cost metrics via OpenCost, all within a single configuration block. Version 4 splits these into clusterMetrics, hostMetrics, and costMetrics, each with its own values file. Each feature's configuration only exposes options relevant to its own concern.

A further change addresses memory usage in the pod log pipeline. In version 3, the chart applied all Kubernetes pod labels and annotations as log labels, then used a labelsToKeep list to filter them down. This required Alloy to allocate memory for potentially hundreds of labels only to discard most of them. Some users reported memory problems in their log-collecting Alloy instances traced directly to this behaviour. Version 4 removes labelsToKeep entirely; pod labels and annotations are not applied in bulk. Instead, users declare explicitly which labels they want promoted. According to the Grafana documentation, adding a label is now a one-line change rather than a full redefinition of a default list.

The Grafana Kubernetes Monitoring Helm chart is not the only approach to cluster-level monitoring. The kube-prometheus-stack, maintained by the prometheus-community organisation, bundles Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics, and the Prometheus Operator into a single Helm install. That chart uses the Prometheus Operator's custom resources, such as ServiceMonitors and PrometheusRules, to provide declarative scrape configuration. It is a common choice for teams building a self-hosted observability stack independent of Grafana Cloud. The Grafana chart, by contrast, targets teams sending telemetry to Grafana Cloud or a managed Grafana stack, and adds support for profiles and cost metrics out of the box. The two charts serve related but distinct use cases.

Grafana Labs has provided a migration tool that accepts a version 3 values file and produces a version 4-compatible output. The tool handles the structural conversions, including converting lists to maps and splitting overloaded features. All chart examples in the Grafana documentation and the repository have been updated to reflect the version 4 format.

The release attracted commentary in the Kubernetes community. Kubesimplify wrote on LinkedIn that "nearly every fragile pattern from v3 has been replaced," pointing specifically to the shift from lists to maps and the opt-in approach to pod log labels as the changes with the most immediate practical benefit. Kubesimplify also noted that the memory reduction in Alloy was a "direct result" of the label change.

For background on monitoring Kubernetes in production, InfoQ published a checklist by Ran Isenberg and Elad Beber in March 2025 that covers observability practices for SRE teams, including guidance on Prometheus and Grafana in cluster environments.

A large number of examples are also available in the Kubernetes Monitoring Helm Chart repo.

About the Author

Rate this Article

Adoption
Style

BT