BT

Metrics Collection from Large Scale IoT Deployments at Vivint

| by Hrishikesh Barua Follow 8 Followers on Apr 15, 2018. Estimated reading time: 3 minutes |

A note to our readers: You asked so we have developed a set of features that allow you to reduce the noise: you can get email and web notifications for topics you are interested in. Learn more about our new features.

Vivint's engineering team built their own metrics collection platform to collect and analyze metrics from their devices. The key motivation behind writing their own system was to be able to store only aggregated data and focus on its analysis, which they achieve by their Rothko project.

Vivint is a provider of smart home devices. Rothko’s fundamental design decision that differentiates it from other systems like Graphite and OpenTSDB is to store aggregated data instead of data points for every service. This was motivated by a conscious trade-off between not storing every data point and still having the ability to pinpoint issues. At the same time, the data had to be available for statistical analysis without losing any key features needed for such analysis.

Rothko allows looking at overall distributions of metrics and analyzing them. Since individual metrics are not stored, does the team ever run into situations where issues with individual devices need to be diagnosed? InfoQ got in touch with Jeff Wendling, software engineer at Vivint, to find out more about this and about Rothko’s architecture:

Indeed, we don't store the individual data points. This is solved in two ways. One, we can easily and cheaply store the minimum and maximum as well as who they came from, so we do. That helps when they're the most deviant outlier. Two, since every device is sending data approximately every 30 minutes, we have a "firehose" that let's us tap into the data and filter out specific metrics or devices, etc. Assuming that it's still sending, we can usually figure out who it is. Of course, both of these methods don't guarantee you'll determine the problem, but it's a cheap and easy 80% solution for 20% of the effort, which fits in with the Rothko principles.

Time series data typically has metadata like tags that store additional properties of the data like the application name or datacenter location for logical grouping during analysis. Is this true for Vivint's data also? Wendling replied:

While we don't send up anything other than a random 'instance id', it's currently just an unstructured slice of bytes. Theoretically you could send up whatever you wanted in there. Since the set of devices we're monitoring are mostly cheap devices in customers' homes, we don't have any GPS equipment in them or anything, but you can get reasonably close with geolocation on the IP.

Rothko’s architecture is composed of a database implementation that uses a configurable number of flat files for each metric that it writes and reads using mmap, an implementation of accepting metrics based on the Graphite wire protocol, an implementation of an approximate quantile sketch to aggregate the data, some API endpoints to retrieve data and render graphs, and a frontend UI for easy human consumption. Data can be sent securely from devices to the Rothko endpoint.

"The design was kept pluggable", says Wendling, since "there are many competing standards and different workloads. For example, internally, we have our own plugin for reading metrics from our custom wire protocol. It's designed to be easy to write plugins and configure them with a toml file. Even logging and internal metrics collection of the process can be easily swapped out to match whatever you want."

Rothko was designed to handle a small number of metrics across a large set of instances. It currently handles close to 50,000 metrics and completes disk flushes for them in about 50 seconds with 500MB of RAM. The flushes happen every 10 minutes, so "it should be easy to do 500k metrics", according to Wendling. It's deployed on a single instance, and there has been no need yet to implement scaling policies like horizontal sharding.

On being asked if Vivint's team also uses any alerting mechanisms, Wendling responded that they don't do so, and the focus is more on keeping an eye on dashboards. Rothko is written in Go, is open source and hosted on Github.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT