Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Grafana Adds Log Data Correlation to Time Series Metrics

Grafana Adds Log Data Correlation to Time Series Metrics

Leia em Português

This item in japanese

The Grafana team announced an alpha version of Loki, their logging platform that ties in with other Grafana features like metrics query and visualization. Loki adds a new client agent "promtail" and serverside components for log metadata indexing and storage.

Loki aims to add traceability between metrics, logs and tracing, although the initial release targets linking the first two. Logs are aggregated and indexed by a new agent with initial support for Kubernetes pods. It uses the same Kubernetes APIs and relabelling configuration as Prometheus to ensure the metadata between metrics and logs remain the same for later querying. Prometheus is not a prerequisite to use Loki, however, having Prometheus makes it easier. "Metadata between metrics and logs matching is critical for us and we initially decided to just target Kubernetes", explained the authors in their article.

According to Loki's design document, the intent behind Loki’s design is to "minimize the cost of the context switching between logs and metrics". Correlation between logs and time series data here applies to both data generated by normal metrics collectors as well as custom metrics generated from logs. Public cloud providers like AWS and GCP provide custom metrics extraction, and AWS also provides the ability to navigate to the logs from the metrics. Both of these have different query languages to query log data. Loki also aims to solve the problem of logs being lost from ephemeral sources like Kubernetes pods when they crash.

Loki is made up of the promtail agent on the client side and the distributor and ingester components on the server side. The querying component exposes an API for handling queries. The ingester and distributor components are mostly taken from Cortex's code, which provides a scalable and HA version of Prometheus-as-a-service. Distributors receive log data from promtail agents, generate a consistent hash from the labels and the user id in the log data, and send it to multiple ingesters. Ingesters receive the entries and build "chunks" -- a set of logs for a specific label and a time span -- which are compressed using gzip. Ingesters build the indices based on the metadata (labels), not the log content, so that they can be easily queried and correlated with the time series metrics labels. This was done as a trade-off between operational complexity and features. Chunks are periodically flushed into an object store like Amazon S3, and indices to Cassandra, Bigtable or DynamoDB.

The querying API takes a time range and label selectors, and checks the index for matching chunks. It also talks to ingesters for recent data that has not been flushed yet. Searches can be regular expressions, but since the log content is not indexed, it has the potential to be slower than if it were. Loki is open source and can be tried out on Grafana's site.

Rate this Article