BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Limitations of the Five Whys Technique in Agile Retrospectives

Limitations of the Five Whys Technique in Agile Retrospectives

This item in japanese

Bookmarks

Five Whys technique is a popular technique for root cause analysis. Many people use it in agile retrospectives. Is this a suitable technique for agile teams? Do we have any alternate better technique for the agile teams?

John Allspaw, SVP of Technical Operations at Etsy favors to discard Five Whys approach in his blog on The Infinite Hows. He says that using the Five Whys is a good first step toward doing real root cause analysis but asking too many whys end up in blaming people.

In order to learn (which should be the goal of any retrospective or post-hoc investigation) you want multiple and diverse perspectives. You get these by asking people for their own narratives. Effectively, you’re asking “how? “

Asking “why?” too easily gets you to an answer to the question “who?” (which in almost every case is irrelevant) or “takes you to the ‘mysterious’ incentives and motivations people bring into the workplace.”

Asking “how?” gets you to describe (at least some) of the conditions that allowed an event to take place, and provides rich operational data.

As per the blog on ARMS Reliability there are following resons for the criticism of Five Whys method:

  • Tendency for investigators to stop at symptoms rather than going on to lower-level root causes
  • Inability to go beyond the investigator’s current knowledge – cannot find causes that they do not already know
  • Lack of support to help the investigator ask the right “why” questions
  • Results are not repeatable – different people using 5 Whys come up with different causes for the same problem
  • Tendency to isolate a single root cause, whereas each question could elicit many different root causes
  • Considered a linear method of communication for what is often a non-linear event

John shares an example from his tutorials at the Velocity Conference in New York.

  • A new release disabled a feature for customers. Why? Because a particular server failed.
  • Why did the server fail? Because an obscure subsystem was used in the wrong way.
  • Why was it used in the wrong way? The engineer who used it didn’t know how to use it properly?
  • Why did he know? Because he was never trained.
  • Why wasn’t he trained? Because his manager doesn’t believe in training new engineers because he and his team are “too busy”.

This causal chain effectively ends with a person’s individual attributes, not with a description of the multiple conditions that allow an event like this to happen.

John says that when we ask “how”, we ask for a narrative or story. In these stories, we get to understand how people work. From the book Behind Human Error here’s the difference between “first” and “second” stories of human error:

First Stories

Second Stories

Human error is seen as cause of failure

Human error is seen as the effect of systemic vulnerabilities deeper inside the organization

Saying what people should have done is a satisfying way to describe failure

Saying what people should have done doesn’t explain why it made sense for them to do what they did

Telling people to be more careful will make the problem go away

Only by constantly seeking out its vulnerabilities can organizations enhance safety

John says that in the Five Whys example above, asked questions frame the answers that we will get in the form of first stories. When we ask more and better questions such as “how”, we have a chance at getting at second stories.

John gave tutorials on why “Five Whys approach” is suboptimal, and the alternative of this is “Five How” approach. This is in four parts, 45 minutes each.

Part I – Introduction and the scientific basis for post- hoc retrospective pitfalls and learning

Part II – The language of debriefings, causality, case studies, teams coping with complexity

Part III – Dynamic fault management, debriefing prompts, gathering and contextualizing data, constructing causes

Part IV – Taylorism, normal work, ‘root cause’ of software bugs in cars, Q&A

Rate this Article

Adoption
Style

BT