Prometheus Adds Long Term Support Model and Improved Remote Write Mode

Prometheus, the open-source monitoring tool, has added a number of new features including a reduced functionality remote write mode. Additional improvements include a new HTTP service discovery mechanism, native histogram support, additional integrations for Alertmanager, and a new long-term support model.

The new Prometheus Agent mode provides a lightweight installation optimized for ingesting metrics and forwarding them to external backends. Bartlomiej Plotka, principal software engineer at RedHat, notes that Prometheus use cases are evolving to include "clusters created on-demand within seconds", platforms such as kcp and Fargate, and edge clusters or networks. In all cases, data cannot be stored on the nodes and needs to be transferred to a remote location.

Plotka notes that Agent is based on the Prometheus Remote Write mode. This allows for forwarding all or some of the collected metrics to a remote location via the Remote Write API. Plotka explains that:

Agent mode optimizes Prometheus for the remote write use case. It disables querying, alerting, and local storage, and replaces it with a customized TSDB [time series database] WAL [write-ahead-log]. Everything else stays the same: scraping logic, service discovery and related configuration.

Prometheus Agent mode high-level architecture

Prometheus Agent mode high-level architecture (source: Prometheus)

Agent mode provides several benefits. The Agent TSDB WAL removes data immediately after it has been successfully written to the remote endpoint. If that remote writing is unsuccessful, it will be stored locally for up to two hours. There is an open issue to increase that time limit. It also provides improved horizontal scalability for data ingestion. Plotka notes that Agent mode is essentially stateless, allowing for scaling in response to the number of targets or metrics.

A new HTTP-based service discovery mechanism was also recently released. This enables custom integrations to be created that connect to other sources. As opposed to the current file-based discovery, HTTP service discovery does not need to run a sidecar process. Update frequency is not instant as with file service discovery, but instead based on the refresh_interval configuration value.

The 2.40.0 release introduced experimental support for native high-resolution histograms. This removes the need for pre-defining the metric buckets, enabling the calculation of percentiles more easily. This can be enabled via the feature flag –enable-feature=native-histograms. Note that existing histograms won't switch unless NativeHistogramBucketFactor is set.

Alertmanager is a tool for dispatching alerts based on metric values and provides deduplication, grouping, and alert silencing. It integrates with a variety of end-points including email, Slack, PagerDuty, and OpsGenie. There are now new integrations to Telegram, Discord, and Cisco WebEx.

A new long-term support (LTS) model for Prometheus has also been announced. Starting with the 2.37 release, LTS versions will receive bug fixes, security patches, and documentation improvements for at least six months. Note that unstable and experimental features and OpenBSD support are excluded from LTS releases.

Prometheus is open-source and available under the Apache 2.0 license. More details on recent improvements can be found within the changelog and a recent episode of OpenObservability Talks.

About the Author

Matt Campbell

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Matt Campbell

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter