Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News A First Look at Opserver, Stack Exchange's Monitoring Solution

A First Look at Opserver, Stack Exchange's Monitoring Solution

This item in japanese

Opserver is an open source monitoring solution, released by Stack Exchange, of Stack Overflow's fame. Unusually in the monitoring tool's space, is is built on top of the .Net Framework.

Opserver aims to provide a quick overall view of each monitored system's health, but allowing the user to deep dive using a drill-down approach. As Nick Craver, one of Opserver's creator told InfoQ:

We believe a monitoring system should show you systems at a high level, present what’s wrong, and allow you to drill in for more detail.

Opserver is organized around web dashboards, each one specialized on a given system. Opserver currently supports SQL Server, ElasticSearch, HAProxy, StackExchange.Exceptional and Redis. It also uses SolarWinds' Orion, a commercial tool, to provide infrastructure and network monitoring. An Opserver installation does not require using all these systems, as they can be configured on an opt-in basis.

Taking SQL Server as an example, Opserver provides high-level information on CPU and memory consumption or the overall health of databases:

(Click on the image to enlarge it)

Opserver's SQL Server Dashobard

Below the 10,000 feet view, Opserver provides additional data. For instance, it provides a list of the top queries, sorted by multiple criteria (total duration, average CPU consumption). For each query, it provides more detailed information, including its query execution plan (a detailed breakdown of the steps taken to execute the query).

(Click on the image to enlarge it)

Opserver's SQL Server Top Queries

There are a few steps to take in order to setup Opserver. Besides github's Opserver readme file, some users have described their setup experiences. In a nutshell, the code must be cloned from github, compiled and published on an IIS server. There is also the need to perform some configurations, of which there are two types: security settings and system's settings. Opserver provides examples for each settings definition, based on the ones used on Stack Exchange itself. These examples can be found at <site root>\Config.

The SecuritySettings.config file is the place where items such as the authentication methods are defined:

<?xml version="1.0" encoding="utf-8"?>
<SecuritySettings provider="AD">
    <!-- Optional, these networks can see the 
	overview dashboard without authentication -->
        <Network name="SE Internal" cidr="" />

Example of global access for everyone:
<SecuritySettings provider="alladmin" />

There is one configuration file per system. The currently supported format is JSON. Here's an example of a SQL Server configuration:

    "defaultConnectionString": "Data Source=$ServerName$;Initial Catalog=master;Integrated Security=SSPI;",
    "clusters": [ // clusters are only available for SQL Server 2012
        	"name": "NY-SQLCL04",
        	"refreshIntervalSeconds": 20,
        	"nodes": [
        		{ "name": "NY-SQL03" }
    "instances": [
        { // This instance cannot use the defaultConnectionString, 
	 // so it has to specify its own.
            "name": "NY-DB05",
            "connectionString": "Data Source=NY-DB05;Initial Catalog=bob;Integrated Security=SSPI;", 
        // The server name on defaultConnectionString gets replaced by "name"
        { "name": "NY-DESQL01" }     ]

If Opserver does not cover a given scenario, there are some extensibility points to augment the tool with additional dashboards and configuration options. There are plans to make this process easier and more powerful in the future:

The biggest upcoming change as time allows is putting in a plugin model.  People will be able to add tabs, views, pollers etc. that others can use.  For example you could put a MongoDB monitoring tab up top with any level of detail you want inside.

The team also has other goals in the tool's roadmap:

It’ll also integrate heavily with our monitoring solution, keeping data history and not just real-time data. 

I plan on including functionality for other third-party tools in the base install to enhance Opserver if you’re using them.  For example, sp_WhoIsActive is already integrated, things like sp_Blitz,sp_AskBrent, and larger products like SQL Sentry will be tied in.  They’ll absolutely not be required, just add views and details if they’re there...since the information they provide will then be available. 

Opserver also exposes almost all the data it has via JSON in a REST-feeling way. I plan to make all data available this way so the UI is totally optional.  This allows whomever to write scripts against routes returning JSON to use in other ways, it really opens up many use cases.

InfoQ asked why Stack Exchange decided to build its own monitoring tool. Nick told us that it grew organically:

It started out as a central exception log viewer from our StackExchange.Exceptional database, a central log location for all our applications. From there as a spare time project I started adding aspects of monitoring that didn’t exist, or didn’t exist correctly already (e.g.: an issue with SQL Server 2012's AlwaysOn monitoring).

From there I started adding SQL features for things we like to keep an eye on because I wanted a single place to view all our systems. After that, I started adding all the systems we use at Stack Exchange...the goal shifted from filling the gaps in existing monitoring to having a single pane of glass view of our infrastructure.


Rate this Article