BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Kubernetes Control Plane Metrics Now Available in Google Kubernetes Engine

Kubernetes Control Plane Metrics Now Available in Google Kubernetes Engine

Google has announced the general availability of Kubernetes control plane metrics in Google Kubernetes Engine (GKE). These metrics are directly integrated with Google Cloud Monitoring providing a single solution for troubleshooting issues with GKE. Integration with third-party observability tooling is also possible via the Cloud Monitoring API.

While GKE fully manages the Kubernetes control plane, the newly exposed metrics can be useful for troubleshooting issues. For example, understanding the health of the API server can be facilitated through a mix of metrics. This includes using apiserver_request_total and apiserver_request_duration_seconds to track the load the API server is experiencing, the number of requests returning errors, and the request response latency.

The newly available metrics can also assist in troubleshooting scheduling issues. The following metrics can all be used to help determine why pods are not moving from pending to scheduled:

scheduler_pending_pods
scheduler_schedule_attempts_total
scheduler_preemption_attempts_total
scheduler_preemption_victims
scheduler_scheduling_attempt_duration_seconds

An increase in the number of pending pods can indicate a problem with scheduling which may be caused by an underlying resource issue.

The new metrics are all displayed within the Kubernetes Engine portion of the Cloud Console. This is available within the Observability tab under Control plane.

Cloud Console interface showing new Kubernetes control plane metrics

Cloud Console interface showing new Kubernetes control plane metrics (credit: Google)

 

With this integration it is possible to create alerting policies in Cloud Altering on these newly available metrics. Continuing with the scheduling issues described above, an alert could be created on both scheduler_preemption_attempts_total and scheduler_pending_pods. The first metric going up could indicate that higher priority pods are displacing other pods from being scheduled. However, both metrics moving up could mean there are not enough resources available for the pods.

When enabled, the metrics will be collected using the Google Cloud Managed Service for Prometheus. The metrics will be sent to Cloud Monitoring in the same GCP project as the Kubernetes cluster. These metrics can then be queries using PromQL through both the Cloud Monitoring API and the Metrics Explorer. In addition, any third-party observability tool could ingest the metrics using the Cloud Monitoring API.

GKE clusters running on control plane version 1.23.6 and up can access metrics from the Kubernetes API server, scheduler, and controller manager. Note that these metrics are not available for GKE Autopilot clusters. The following command can be used to enable the collection of metrics from the API server, scheduler, and controller manager:

gcloud container clusters update [CLUSTER_ID] \
  --zone=[ZONE] \
  --project=[PROJECT_ID] \
  --monitoring=SYSTEM,API_SERVER,SCHEDULER,CONTROLLER_MANAGER

The metrics can also be configured via Terraform using the monitoring_config block.

Kubernetes Control Plane metrics are charged at the standard price for metrics ingested by the Google Cloud Managed Service for Prometheus. For more details on the release, please refer to the blog post.

About the Author

Rate this Article

Adoption
Style

BT