Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage On-call Content on InfoQ


RSS Feed
  • Lightstep Adds Incident Response to Their Observability Platform

    Lightstep has announced the addition of incident response management to their observability platform. The general availability of Lightstep Incident Response provides integrations with common collaboration tools, rotation scheduling, escalation policies, APIs, and a CLI.

  • Grafana Cloud Adds Incident and On-Call Management Solutions

    Grafana has announced the addition of incident management and on-call support to their Grafana Cloud offering. Grafana Incident, currently in preview, generates meeting spaces, integrates with Slack, and constructs incident timelines with information pulled from Grafana dashboards. Grafana OnCall provides on-call rotation scheduling and notification from connected monitoring systems.

  • Blameless Post-Mortems and On-Call Gamification at 1st DevOpsDays Portugal (Day 2)

    Ten years after the first DevOpsDays conference in Ghent, the evolution of DevOps and organizations trying to adopt it was at the forefront of the first DevOpsDays conference in Portugal. On the second day, a mix of local and international speakers covered topics such as learning from incidents without blame, gamifying on-call, modern pipelines, and more.

  • What Resiliency Means at Sportradar

    Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

  • Handling Incidents and Outages

    David Mytton, CEO at Server Density, shared with the devopsdays Amsterdam 2015 crowd how they handle incidents and outages. The process is grounded on a key set of principles: frequent public updates; exhaustive logging of the response activities; team effort and effective escalation. Server Density draws a lot of inspiration from the aviation industry, renowned for its safety procedures.

  • State of On-Call Survey

    VictorOps published the results of its survey on the state of on-call activities, which it claims to be the first of its kind. The survey includes data about the challenges of being on-call, the way those who are on-call get notified, the tools they use to support incident resolution, the prevalence of false alarms, the average time of each incident resolution and more.