BT
rss
Culture & Methods Follow 799 Followers

Atlassian Announces Solutions for Incident Management

by Ben Linders Follow 28 Followers on  Sep 20, 2018

Atlassian announced on September 4 that they have launched a new product called Jira Ops and that they will acquire OpsGenie. Organizations can use Jira Ops for resolving incidents and doing post-mortems to learn from them. OpsGenie adds prompt and reliable alerting to Jira Ops.

Development Follow 686 Followers

Google Cloud Incident Root-Cause Analysis and Remediation

by Sergio De Simone Follow 18 Followers on  Jul 26, 2018

Google disclosed its root-cause analysis of an incident affecting a few of its Cloud services that increased error rates between 33% and 87% for about 32 minutes, along with the steps they will take to improve the platform performance and availability.

DevOps Follow 972 Followers

What Resiliency Means at Sportradar

by Manuel Pais Follow 9 Followers on  Apr 06, 2018

Pablo Jensen, CTO at Sportradar, talked about practices and procedures in place at Sportradar to ensure their systems meet expected resiliency levels, at this year's QCon London conference. Jensen mentioned how reliability is influenced not only by technical concerns but also organizational structure and governance, client support, and requires on-going effort to continuously improve.

DevOps Follow 972 Followers

Post-Mortems Trends and Behaviors

by Manuel Pais Follow 9 Followers on  Nov 29, 2017

Eric Siegler presented his findings at Velocity from analyzing data from 1000 post-mortems ran by 125 different organizations over a six month period. Main trends include the prevalence of blameless post-mortems; the fact that only 1 in 100 post-mortems refer to "human error"; and that analyzing the lifecycle of incidents can provide useful insights on weaknesses in the incident response process.

DevOps Follow 972 Followers

Q&A with Sanjeev Sharma on His DevOpsDays NZ Keynote

by Rafiq Gemmail Follow 6 Followers on  Sep 27, 2017

Raf Gemmail speaks with IBM's Sanjeev Sharma about his upcoming DevOpsDays NZ closing keynote on the DevOps and SRE lessons we can learn from Apollo 13.

Followers

Handling Incidents and Outages

by João Miranda Follow 2 Followers on  Jun 29, 2015 2

David Mytton, CEO at Server Density, shared with the devopsdays Amsterdam 2015 crowd how they handle incidents and outages. The process is grounded on a key set of principles: frequent public updates; exhaustive logging of the response activities; team effort and effective escalation. Server Density draws a lot of inspiration from the aviation industry, renowned for its safety procedures.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT