Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Grafana Cloud Kubernetes Monitoring with Machine Learning Predictions

Grafana Cloud Kubernetes Monitoring with Machine Learning Predictions

Managing cloud costs can be challenging as Kubernetes fleets scale. Grafana Cloud has introduced a cost-monitoring feature within "Kubernetes Monitoring" to address this issue. In particular, Grafana Cloud’s Kubernetes Monitoring offers ML predictions for CPU and memory usage. It can also identify CPU outlier pods, allowing users to anticipate potential resource issues before they impact mission-critical infrastructure or budgets.

To generate forecasts for CPU and memory usage, it is necessary to specify a source query (representing the time series to be modeled) and configure the machine learning model. Subsequently, the system autonomously trains the model in the background. The users can configure model parameters to tune models to improve predictions. Upon successful completion of the training, users can submit queries to anticipate the series' values at various future points. Additionally, the model provides confidence bounds for the predicted values.

Grafana Kubernetes Monitoring offers a unified suite of tools to monitor your Kubernetes environment proactively for optimal resource utilization and reactively for issue troubleshooting and early detection. In Grafana Cloud, you can gather and store metrics, pod logs, cluster events, traces, and cost metrics.

The platform provides ready-to-use, pre-configured visualizations that help users identify cost-intensive components, facilitating the reallocation of resources within their fleets. It provides an overview of cost attribution and actionable insights to reduce Kubernetes infrastructure expenses. By combining the cost monitoring feature, based on the opencost project, with Grafana’s resource utilization efficiency tools, users gain real-time insights and better control over resource utilization and associated costs.

Managing vast volumes of Kubernetes telemetry data can be overwhelming when troubleshooting performance issues. Grafana Cloud addresses this by introducing a comprehensive homepage within Kubernetes Monitoring. This homepage highlights the most critical metrics and helps users identify infrastructure issues quickly. It can spot problems such as ContainerCreating, CrashLoopBackOff, ImagePullBackOff, PodInitializing, capacity shortages for node PCU, persistent volumes, and storage.


Kubernetes Monitoring home page


The number of integrations available in Grafana Cloud’s Kubernetes Monitoring has been significantly expanded. These integrations come with prebuilt dashboards, scraping rules, and alerts to simplify service monitoring within your fleet. This enhancement allows users to access out-of-the-box monitoring for various technologies, including Aerospike, Apache ActiveMQ, Cilium, CoreDNS, etcd, NGINX, GitLab, Apache Kafka, CockroachDB, Apache Cassandra, PostgreSQL, and MySQL.

Grafana Cloud's Kubernetes Monitoring empowers users to eliminate unnecessary spending, enhance efficiency, and maximize return on investment. With Kubernetes cost monitoring, users can take control of their budget and implement proactive alerting mechanisms to make data-driven decisions regarding resource allocation, scaling strategies, and technology investments.

Alternatives of Grafana Kubernetes Monitoring are Elastic Observability and Datadog. Both platforms provide monitoring solutions aided by artificial intelligence.

About the Author

Rate this Article