Empirical Studies Show Test Driven Development Improves Quality
A paper first published in the Empirical Software Engineering journal reports: "TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team." The study compared 4 projects, at Microsoft and IBM that used TDD with similar projects that did not use TDD.
The paper was authored by Nachiappan Nagappan (microsoft), E. Michael Maximilien (IBM), Thirumalesh Bhat (Microsoft), and Laurie Williams (North Carolina State University), and published in Volume 13, Number 3 of the journal Emperical Software Engineering. It is also available from the Empirical Software Engineering Group Microsoft Research.
The paper includes 1 case study at IBM and 3 from Microsoft. Each of the case studies compare two teams working on the same product, using the same development languages and technologies, under the same higher-level manager, only one of which was using test-driven development (TDD). None of the teams knew that they would be part of the study during their development cycles. The IBM case study followed teams doing device driver development. The Microsoft cases followed teams working on Windows, MSN, and Visual Studio.
The paper describes the TDD practices used by the teams as minute-to-minute workflows, as well as task-level workflows.
- Write a small number of new tests
- Run the tests and see that they fail
- Implement code to satisfy the tests
- Re-run the new unit test cases to ensure they now pass
- Integrate new code and tests into the existing code base
- Re-run all the test cases to ensure the new code does not break anything
- Refactor the implementation and/or test code
- Re-run all tests to ensure that the refactored code does not break anything
The pre-release defect density of the four products, measured as defects per thousand lines of code, decreased between 40% and 90% relative to the projects that did not use TDD. The teams' management reported subjectively a 15–35% increase in initial development time for the teams using TDD, though the teams agreed that this was offset by reduced maintenance costs.
These results can be compared to those found in a paper published in 2006 by Maria Siniaalto. That paper attempted to review and summarize the results from 13 other studies on test-driven development, including research conducted in industrial, semi-industrial, and academic contexts. Among the conclusions of the paper, the author wrote:
Based on the findings of the existing studies, it can be concluded that TDD seems to improve software quality, especially when employed in an industrial context. The findings were not so obvious in the semiindustrial or academic context, but none of those studies reported on decreased quality either. The productivity effects of TDD were not very obvious, and the results vary regardless of the context of the study. However, there were indications that TDD does not necessarily decrease the developer productivity or extend the project leadtimes: In some cases, significant productivity improvements were achieved with TDD while only two out of thirteen studies reported on decreased productivity. However, in both of those studies the quality was improved.
What are your experiences with TDD? Have you seen an increase in quality? What effects have you seen on developer productivity, and development time? Leave a comment and share your experiences.
Of course in reading this paper I do question a tad how much the everyone involved in the study *truly* get TDD. But, truth is, it may even be more useful for this to come from "non-expert", non-perfect (from a TDD-sense) angles, being most organizations searching for pudding-proof data aren't likely to have "experts" or perfect situations anyway (who does?) - and judging from this data then, they can still see great benefit!
As a companion to these "hard numbers", I've put together a quick "elevator speech"-ready bullet list of what I most often cite as the primary reasons you can't survive without TDD:
Reflects My Experience
However, once the project is running, major defects start to shrink away. This leaves your end-of iteration testing to focus on the tests that simply could not be automated. Consequently, this minimizes Quality Assurance resources required for the project.
After a couple of iterations, no one notices TDD as being 'slow'. Instead, everyone comes to love the confidence it gives them. With TDD, there are fewer opportunities to say, "Ahh. I don't really need to test that. It's so straightforward." This kind of anti-TDD thinking adds up and eventually costs.
With TDD, you can realistically achieve extremely high levels of tested code coverage. 70% or more is expected. 95% or higher is very achievable. This results in high levels of confidence in your code base.
In my experience, besides being less error prone, TDD also results in code that is better factored and more maintaintable. Forgot how some esoteric piece of code is supposed to function? Go to the test; it's almost guaranteed to exist.
With TDD, you know the risk of being blindsided by an unintended consequence of your code change is small. This confidence, in turn, reduces the size of estimate you can give for new features.
It's a big win, all around. Glad there's emperical evidence I can now point to, to prove it.
Research is too expensive to access
Reduced developer time.
1) The team with the lowest code coverage has the best improvement in defect rates. This is precisely the opposite of what TDD suggests. The study's authors do not discuss this.
2) The study is incomplete. The conclusion is that TDD has lower defect rates, but it takes 15% to 35% longer to write the code. If the non-TDD teams were given 15%-35% more time to test and debug their code, what defect density would they have reached? This question is not asked.
Without either question answered, it's hard to really conclude anything about the data.
Oh? Re: Deeply flawed
Point 2) It *could* take 15 to 35% to write the code. It usually takes more time to write tho code because the TDDers are developing tests. If you don't write tests and don't bother to write/design to automate unit tests, it takes less time to churn out code. So they finish earlier. Their process doesn't allow for test automation but for marching to code complete. There's nothing for them to do after they write the code. They don't have anything to do with the time.
I don't agree that it's inconclusive. There are other studies that reach the same conclusion: confessionsofanagilecoach.blogspot.com/2011/09/...
Re: Oh? Re: Deeply flawed
It did not cover the question of whether it helped with quality as perceived by the user, who clearly wants less defects but may also see quality as ease of use, data accuracy, ease of maintenance etc.
Similarly it does not cover the situation I find myself in on most projects where I want quality across multiple systems that integrate together ... Does TDD improve testing across complex, integrated solutions?
As an MBA guy myself I would want to see a study across multiple projects that assessed the impact of a combination of techniques such as service driven design, TDD and refactoring.
Until then my mantra for project managers is this ... If you are a plumber you can tell the plumber fixing your bathroom which tools to use. If you are not a plumber then you should trust his or her judgement in how to use their toolkit.
Similarly, I hire developers who are good at their craft (or trade) and so I trust them to choose which tools to use on the job.if I can't trust them to do that I need to hire developers I trust or I need to learn to code as well as they can.
In fact I would prefer to rely on their judgement rather than on my understanding of even a sophisticated piece of research that may or may not apply to the plumbing I need fixed.
Most of my developers tell me TDD is great for development and maintenance of object oriented IT systems, once the team get good at writing good tests.so I guess it probably is.
They also tell me it is not related to improved usability, database optimisation or some types of coding so I guess its not.