BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

State of On-Call Survey

| by João Miranda Follow 2 Followers on Dec 17, 2014. Estimated reading time: 2 minutes |

VictorOps published the results of its survey on the state of on-call activities, which it claims to be the first of its kind. The survey includes data about the challenges of being on-call, the surrounding context of those on-call and the trends that are shaping this part of the industry.

On-call duties have gained prominence, with the rise of the Internet and its global reach, meaning most sites have to be kept alive 24x7. If we add trends that tend to increase deployment rates, such as Continuous Delivery and DevOps, then the on-call activity becomes critical: 60% of the surveyed claim to be Agile, while 52% do DevOps.

On-call duties pose challenges on the human, organizational and technological levels. Being on-call can have a high impact on the work-life balance. One of the respondents commented:

It affects my health due to complications of tension, and anxiety over missing family events.

60% of the respondents say that things are only slightly getting better or are even getting worse. The challenges include burnout, due to not having enough people on the on-call rotation: 72% last for a week or less, but 22% last for more than two weeks. Respondents also complain about lack of accountability, given that people not responding to calls of help happens more than it should. Lack of discoverable documentation and of incident follow-ups were also identified as big problems.

Most respondents claimed to use Nagios/Icinga and New Relic for monitoring operations, although there is a long tail of other solutions. 64% of respondents estimate that up to 25% of all alerts are false alarms, leading to 63% of them reporting alert fatigue. A curious, but not unexpected finding, is that many organizations use up to 5 monitoring services.

The respondents get mostly notified of incidents via email (82%) and SMS (57%). Following them, phone calls (46%), push notifications (37%) and dashboards (31%). During incident remediation, most teams use a chat platform (72%), 1-1 phone calls (65%) and conference calls (50%). Lagging behind are wiki articles (33%), graph tools (30%) and video conferencing (24%). Only 23% use runbooks, a set of defined procedures to be carried out in a given context. The survey does not state whether they're automated.

Incident resolution takes between 10 and 30 minutes for 44% of the surveyed, while 33% revealed it takes them between 30 and 60 minutes. On-call teams are multidisciplinary, including operations, development and support, as incident solving requires different skills.

When it comes to post-mortems, 50% of the respondents reportedly do them, but 75% of them only do it after a major outage. Somewhat encouragingly, 65% practice blameless post-mortems. Post-mortems have two purposes: help the team to learn; report to the executive team an account of what happened.

63% of the surveyed told that their infrastructure is still physical (on-premises). Interestingly, 58% are using infrastructure automation tools (e.g.: Puppet or Chef), but of those only 75% agree that these automation tools help with on-call duties.

All of 500 people surveyed were North Americans. On the statistical relevance of the survey, VictorOps gives it 95% confidence +/- 5% margin of errors.

VictorOps is a SaaS that provides on-call management, incident notifications and timelines. PagerDuty and OpsGenie are other players in this space.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT