BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

All Right It Failed, What Next?

by Vikas Hazrati on Jun 29, 2011 |

Usually failures result in anger, frustration and playing the blame game. However, failures are wasted if there is no learning from them. How can Agile teams make failures beautiful?

James Shore suggested that instead of getting angry, he acknowledges that everybody is doing the best job that they can.

Rather than blaming people, I blame the process. What is it about the way we work that allowed this mistake to happen? How can we change the way we work so that it's harder for something to go wrong? This is root-cause analysis.

One of the most effective ways of doing root cause analysis in the event of a failure is the 5-Why's technique. The 5-why's analysis has its origins in lean manufacturing. It is used to find the root cause of a problem through identifying a symptom and then repeating the question “Why?” five times. It is observed that usually the solution becomes clear after 5 iterations of asking why.

Another technique used by some Agile teams is the Fishbone diagram, which looks at the big picture around the problem. Infact, to visually view the process of 5-why's the fishbone diagram is often very useful. A related yet interesting technique suggested by Joel Spolsky is the 'Fix it Twice' method. It suggests having a quick solution for fixing the incident so that the team can move further and then having a slower fix, which prevents the incident from occurring again.

So what is the best way to conduct a root cause analysis?

Jim Bird suggested the following,

  • Get the right people in the room.
  • Create the right environment for blameless problem solving
  • Don't stop unless the real problems and solutions have been identified.
  • Don't be satisfied with a single root cause. Many situations are more complex than that.
  • Just human error need not be the outcome.

Likewise, Gojko Adzic quoted Douglas Squirrel when he suggested that after getting all the affected parties together, there should be a poll to identify the problems. Once the problems are identified, follow the 5-Why's technique till it hurts. If it does not hurt then you are not doing it right. Once the problems have been identified, a very important aspect is to define outcomes which are proportional to the problem.

Don’t get carried away and “retrain your development team because of five minutes of downtime”, said Squirrel, “but define tasks proportionate to the problem”. “It’s not necessary to solve problems, but make progress”, said Squirrel. Instead of gold-plating solutions, he suggested acting quickly. “If you do it wrong, it will come back again”. Solutions that take too long will never get done, so Squirrel suggested thinking about what you can do in a week or even in a hour, and building up the solution the next time a problem happens.

Jim too, suggested that the real work begins once the root cause analysis is over. It is easy for people to get back into the delivery mode and forget about the failures. However, the tasks decided as an outcome of the root cause analysis need to be actively managed and tracked in the backlog. Metrics need to be collected and people need to be made aware of the right way.

You’ll need to use metrics and cost data to drive behavior and to drive change, and to decide how much to push and how often: are you changing too much too often, running too loose; or is change costing you too much, are you overcompensating?

Thus, failures are best utilized as learning grounds. The key lies in identifying the root cause and tracking the 'proportional to problem' solution tasks actively to closure.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Built-in Self Regulation by William Louth

A related yet interesting technique suggested by Joel Spolsky is the 'Fix it Twice' method. It suggests having a quick solution for fixing the incident so that the team can move further and then having a slower fix, which prevents the incident from occurring again.


That second fix should come under software resiliency engineering which I think is going to a big area of concern in the coming years and which starts with software being imbued with self-observation and self-regulation capabilities that are continually extended with knowledge acquisition during incident and problem management.

Automated Performance Management starts with Software’s Self Observation
opencore.jinspired.com/?p=2709

Activity Based Costing & Metering (ABC/M) – The Ultimate Feedback Loop
opencore.jinspired.com/?p=4052

Development of information technology by xiangming hu

With scientific and technological development, information technology, many software inventions but also to our work and life has brought, the great Obama also recognized this software.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT