Does TDD Really Ensure Quality?

There's been some interesting commentary on the National Research Council of Canada's paper titled "The Effectiveness of Test-first Approach to Programming" . The study, carried out on a sample size of 24 IT graduates, adds to the growing body of research on the topic. Though TDD is accepted as an excellent learning tool for quickly understanding the domain in which developers work, the question of whether TDD directly correlates quality in software is still considered unproven by some. This study, while still not conclusive, does show some interesting results - though different results, depending on who's analysing them.

The study's abstract reads, in part:

Test-Driven Development (TDD) is based on formalizing a piece of functionality as a test, implementing the functionality such that the test passes, and iterating the process. This paper describes a controlled experiment for evaluating an important aspect of TDD: In TDD, programmers write functional tests before the corresponding implementation code.

The experiment was conducted with undergraduate students. While the experiment group applied a test-first strategy, the control group applied a more conventional development technique, writing tests after the implementation. Both groups followed an incremental process, adding new features one at a time and regression testing them.

The researchers noted: "The results of the experiment support an alternative theory of the Test-First technique that is mainly centered on productivity rather than on quality."

Our main result is that Test-First programmers write more tests per unit of programming effort. In turn, a higher number of programmer tests lead to proportionally higher levels of productivity. Thus, through a chain effect, Test-First appears to improve productivity.

... We also observed that the minimum quality increased linearly with the number of programmer tests, independent of the development strategy employed.

However, one blogger, Jacob Proffitt, a self-described "passionate developer, sometimes manager, and general all-round techno-geek," probed the paper and blogged his critique of it., proposing that the paper shows a strong tendency toward confirmation bias - i.e. coming to conclusions in spite of the findings of the work. He believes that "TDD’s relationship to quality is problematic at best," citing:

The control group (non-TDD or "Test Last") had higher quality in every dimension—they had higher floor, ceiling, mean, and median quality.

The control group produced higher quality with consistently fewer tests.

Quality was better correlated to number of tests for the TDD group (an interesting point of differentiation that I’m not sure the authors caught).

The control group’s productivity was highly predictable as a function of number of tests and had a stronger correlation than the TDD group.

Jacob proposes that the only facts this study's data tells us are:

The test-first students on average wrote more tests.

Students who wrote more tests tended to be more productive.

The minimum quality increased linearly with the number of tests.

Hakan Erdogmus, editor of IEEE Software Magazine and co author of the original paper, views these points from a different perspective:

A single study, especially a small one like ours, regardless of how well conducted, does not prove or disprove anything. The observations at best shed light to a small part of a large puzzle. In many circumstances, they raise more questions than they answer, hopefully more relevant questions that improve our understanding of the phenomenon under study ... In fact, "proof" is not part of the empirical software engineering terminology. Strength of collective evidence and building refutable theories are the best we can achieve by studying a specific technique. While for certain few practices, notably software inspections, we are now able to state that the evidence is strong. But the jury is still out for TDD.

More so, Hakan told InfoQ this about the wider TDD discussion in context of the breadth of research done so far:

The 23 TDD studies published between 2001 and early 2008 provide somewhat conflicting results, but a big picture is emerging on closer inspection. The differences in findings stem from the multiplicity of context factors that influence the outcome variables measured. On the quality front, the results are more compelling, if not resoundingly in agreement. Of the 22 studies that evaluated some aspect of internal or external quality with vs. without TDD, 13 reported improvements of various degrees, 4 were inconclusive, and 4 reported no discernable difference (including our study). Only one study reported a quality penalty for TDD.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Applied Research topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter