BT

InfoQ Homepage Incident Response Content on InfoQ

News

RSS Feed
  • How Did Things Go Right? Learning More from Incidents at Netflix: Ryan Kitchens at QCon New York

    At QCon New York, Ryan Kitchens presented “How Did Things Go Right? Learning More from Incidents”. Key takeaways from the talk included: recovery is better than prevention; an incident occurs when there is a “perfect storm” of events -- there is no root cause; “stop reporting on the nines”, as user happiness is more important; and there is value in learning how things go right.

  • Splunk Releases Splunk Connected Experiences and Splunk Business Flow

    Data analytics organisation, Splunk, recently released Splunk Connected Experiences which delivers insights through augmented reality (AR), mobile devices like Apple TV, and mobile applications. They also released Splunk Business Flow which enables business operations professionals to gain insights across their customer journeys and business processes.

  • Scaling, Incident Management and Collaboration at New York Times Engineering

    The New York Times Engineering Team wrote about their approach to scaling and incident management against the backdrop of increased traffic during the November 2018 US midterm elections.

  • OpsRamp Announces Improved Service Centricity, AIOps and Cloud Monitoring

    OpsRamp, a service-centric AIOps software-as-a-service (SaaS) platform for the hybrid enterprise, has announced new topology maps, enhanced artificial intelligence for IT operations (AIOps) features and new monitoring capabilities for cloud native workloads.

  • Atlassian Announces Solutions for Incident Management

    Atlassian announced on September 4 that they have launched a new product called Jira Ops and that they will acquire OpsGenie. Organizations can use Jira Ops for resolving incidents and doing post-mortems to learn from them. OpsGenie adds prompt and reliable alerting to Jira Ops.

  • Google Cloud Incident Root-Cause Analysis and Remediation

    Google disclosed its root-cause analysis of an incident affecting a few of its Cloud services that increased error rates between 33% and 87% for about 32 minutes, along with the steps they will take to improve the platform performance and availability.

  • What Resiliency Means at Sportradar

    Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

  • Post-Mortems Trends and Behaviors

    Eric Siegler presented his findings at Velocity from analyzing data from 1000 post-mortems ran by 125 different organizations over a six month period. Main trends include the prevalence of blameless post-mortems; the fact that only 1 in 100 post-mortems refer to "human error"; and that analyzing the lifecycle of incidents can provide useful insights on weaknesses in the incident response process.

  • Q&A with Sanjeev Sharma on His DevOpsDays NZ Keynote

    Raf Gemmail speaks with IBM's Sanjeev Sharma about his upcoming DevOpsDays NZ closing keynote on the DevOps and SRE lessons we can learn from Apollo 13.

  • Handling Incidents and Outages

    David Mytton, CEO at Server Density, shared with the devopsdays Amsterdam 2015 crowd how they handle incidents and outages. The process is grounded on a key set of principles: frequent public updates; exhaustive logging of the response activities; team effort and effective escalation. Server Density draws a lot of inspiration from the aviation industry, renowned for its safety procedures.

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.