Uber Open Sources Its Large Scale Metrics Platform M3

Uber's engineering team released its metrics platform M3, which it has been using internally for some years, as open source. The platform was built to replace its Graphite based system, and provides cluster management, aggregation, collection, storage management, a distributed time series database (TSDB) and a query engine with its own query language M3QL.

Uber's previous metrics collection and monitoring system was based on Graphite, backed by a sharded Carbon cluster, with Nagios for alerting and Grafana for dashboarding. Issues with this included poor resiliency and clustering, high operational cost to expand the Carbon cluster, and a lack of replication which made each node a single point of failure. M3 - the new metrics system - was born out of these shortcomings. In addition to scalability, global, responsive querying across datacenters, the goals for the new system were the ability to tag metrics, and maintain backwards compatibility with services that emitted metrics in the StatsD and Graphite format. Rob Skillington, staff software engineer at Uber, describes the architecture of M3 in a recent article. M3 currently stores 6.6 billion time series, aggregates 500 million metrics per second, and stores 20 million metrics per second.

The initial version of M3 had open source components like statsite for aggregation, Cassandra for storage, and Elasticsearch for indexing. Each component was gradually replaced by an in-house implementation because of increasing operational overhead and a demand for new features. Due to the widespread use of Prometheus in multiple teams at Uber, M3 was built to integrate with Prometheus as a remote storage backend.

The Prometheus integration is via a sidecar component that writes to local regional M3DB instances and fans out queries "to inter-regional coordinators which coordinate reads from their local regional M3DB (the storage engine) instances". This model is similar to the the way that Thanos, an extension to Prometheus that provides cross-cluster federation, unlimited storage and global querying across clusters, works. However, the Uber team did not choose Thanos for various reasons, with the primary one being high latencies for metrics that are not stored locally. Thanos pulls and caches metrics data from AWS S3, and the associated latencies as well as the additional disk usage for the cache were unfeasible due to Uber's latency requirements and the large amount of data.

M3's query engine provides a single global view of all metrics without cross region replication. Metrics are written to local regional M3DB instances and replication is local to a region. Queries go to both the regional local instances as well as to coordinators in remote regions where metrics are stored. The results are aggregated locally, and future work is planned wherein any query aggregation would happen at the remote coordinators.

M3 lets users specify the retention period and granularity per metric for storage, like Carbon does. M3's storage engine replicates each metric to three replicas in a region. To reduce disk usage, data is compressed using a custom compression algorithm. Most time series databases have a compaction feature where existing smaller data blocks are rewritten into larger ones, and restructured to improve query performance. M3DB avoids compactions where possible, to maximize the utilization of host resources for more concurrent writes and provide steady write/read latency.

Skillington says in the article that "M3DB itself only compacts time-based data together when absolutely necessary, such as backfilling data or when it makes sense to combine time window index files together." Metrics are downsampled using a streaming model where the downsampling happens as the metrics come in.

M3's own query language - M3QL - is used internally at Uber due to features that are not available in PromQL. There are limits to the cardinality of metrics that can be handled, which are more in terms of querying than of storage. M3's storage also optimizes access times by utilizing Bloom filters and indexes in memory-mapped files. A Bloom filter is used to determine if something might exist in a set, and in M3 it's used to determine if a series that is queried for needs to be retrieved from disk. The team is working on adding support for running M3 on Kubernetes.

M3 is written in Go and available on Github.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter