BT

Analyzing Experimental Data Concerning Agile Practices

by Amr Elssamadisy on Oct 04, 2007 |
It is not unusual in conversations concerning the effectiveness of Agile development practices for someone to quote “professor X in famous University Y ran an experiment to prove that Agile practice Z is 20% more effective than traditional development practices.”  We therefore take that as truth because – of course – it must be correct.  Unfortunately most experiments run and published have results that should not be generalized to real-world development projects.  Fortunately, it is not difficult to quickly determine how much confidence you should (not) have in experimental results.

There are several validity criteria that will quickly enable you to determine if you should expect the same results as those reported in an experiment:

External validity – also known as generalizability – helps you determine if the results of an experiment are applicable to other situations. Does a pair programming experiment with students generalize to professional developers?  In simple terms – no.  If you are in a business environment, an experiment on students is not applicable because they are in a completely different context, are building different types of software, and have different experience.  The context of the experiment should closely match the context of the real-world application.

Internal validity – is there true cause and effect between variables.  For example, does pair programming improve code quality?  If a group is pair programming, writing tests first, and taking longer to build the application – can we reasonably assume that it is pair programming that improved the quality – or are there other explanations?  For example, could the fact that they spent more time building the application have made the difference?

Construct validity – is there a correspondence between your measurements and the concepts (constructs) under study.  Does the measure being used, for example cyclomatic complexity, really indicate the quality of the concept being evaluated – in this case design?

Statistical validity - Is the sample size big enough, and are the results ‘statistically significant’?  If you read of an experiment with real developers working for a week, and this shows an improvement in quality of design when using TDD, can we count on the results?  In this case, no.  A week is not enough data to extrapolate the effectiveness of TDD on a multi-month or multi-year project.

An experiment that was run to evaluate the effectiveness of TDD (in speed and design quality) can be found here.  This experiment was run with professional developers developing 200 lines of code.  The reader that is aware of the different types of validity can easily see the error of taking the results to mean they can be applied to projects of thousands (or millions) of lines. 

A critical look at a pair programming report which claimed that pair programming was 15% faster than the alternative can be found on hacknot.

The fact is , the requirements needed for an experiment that can be generalized to real-world projects are prohibitively expensive.  Experiments with students can only be generalized to other students.  Experiments using professional developers for only a limited amount of time have results that cannot be generalized to long-running development projects.  If you have cited experimental results before, take a quick re-read of the paper in this new light and share your thoughts.

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

What does Agile have to do with this? by Jeff Santini

I totally take your point on being aware of the applicability of experiment results to a real world situation, but don't your points hold true for non-Agile Practices as well? I think they will even hold true for any issue which can be experimented with.

Why did you only use examples relating to Agile practices?

Re: What does Agile have to do with this? by Amr Elssamadisy


but don't your points hold true for non-Agile Practices as well?


You are absolutely right Jeff, these points hold for any experiments.

The main reason I brought it up on the Agile Queue is that, to date, all of the discussions I've had with colleagues around Agile experiments are misreadings/misrepresentations of experimental data. I figured that I'm probably not alone in this, so I decided to share this information with others who have found themselves in my situation.

About the TDD report by Darren Tarbard

"The experiment results showed that TDD developers took
more time (16%) than control group developers."

They didn't however take into account that there is very likely to be less bugs in TDD code than otherwise and it is more efficient to fix bugs as soon as they occur rather than further down the line.

A more effective approach - studying real-life projects by John Rusk

The most effective approach I've heard of is to study real life projects. You need to study a lot of them, to to make sure your conclusions are valid. Alistair Cockburn did this, over a period of 10+ years. It's interested reading how his working hypotheses, about what makes teams successful, changed during that period. I've blogged about his research here, agilekiwi.com/scientific_experiments.htm

Re: About the TDD report by Amr Elssamadisy


They didn't however take into account that there is very likely to be less bugs in TDD code than otherwise and it is more efficient to fix bugs as soon as they occur rather than further down the line.


You are absolutely right, but to do that with an experiment means that you need an extended amount of time (months? years?) and some way to evaluate design.

Experiments are notoriously hard to run correctly and extremely expensive. So maybe we shouldn't rely on them and find some other way to document things.

Patterns maybe?

Re: A more effective approach - studying real-life projects by Amr Elssamadisy

There is significant research out there, and Cockburn's work is one of many. They are not experiments, however. <plug>For example, there is the Agile Adoption Patterns book.</plug>

These are orthogonal subjects.

Re: A more effective approach - studying real-life projects by John Rusk

Amr,

Your comment is correct about the typical agile book, which does not have any extensive scientific evaluation behind its recommendations.

However, there is a big difference between a book that explains how to do agile practices, and research that actually evaluates the effectiveness of the practices. Most books are in the first category; Cockburn's work is in the latter.

If you are suggesting that his work is no different from other "how to" books, then I take issue with that suggestion. If you have time to read his research (the doctoral dissertation is linked from my page above) you'll see that it contains a range of research techniques, including "field experiments, surveys, case studies, action research" and several other techniques. Those techniques, the rigour with which they were applied, the duration of the research (10 years), and the large number of projects evaluated all distinguish Cockburn's work from the typical agile book.

Re: A more effective approach - studying real-life projects by Amr Elssamadisy


However, there is a big difference between a book that explains how to do agile practices, and research that actually evaluates the effectiveness of the practices. Most books are in the first category; Cockburn's work is in the latter.


Let me first say that I am not directly comparing my patterns work to Cockburn's research thesis, but offering a commonality. With that said, a cursory reading of the thesis says:

1)
The more carefully controlled were experiments I ran in courses on object-oriented design and on use case writing.


and 2)
The less formal experiments were those I used on live projects. Due to the delicacy of staying within the boundaries of project success, I had only a limited range of experimentation and limited measurement devices.


What you bring up John is exactly the point I'm trying to make in this article. We should not be intimidated by the words 'PhD', 'thesis', or a very well known name.

So, from the article, Cockburn's formal experiments fail external validity. The informal ones fail statistical validity, and have weak internal validity because they are not true experiments - no control groups and many confounding variables (for the same reason).

This does not mean that Alistair Cockburn's work is not important, but it does mean that they are not experiments and should not be treated as such.

Finally, the point of similarity between the patterns work and the PhD thesis is observation over many projects over long periods of time. The fact that I did not say 'the patterns work is based on X years of practice' does not mean it is not valid, only that I try to keep my comments from being too much of an advertisement ;)

Re: A more effective approach - studying real-life projects by John Rusk

Good point. The bulk of Cockburn's research uses a range of research techniques other than pure experiments. E.g case studies and other techniques to gain insight into a large sample of projects. That's often the best we can do. And if the approach is applied over a large enough number of projects, we can have reasonable confidence in the results.

It's certainly better than the alternative of, "Here's what we did on x projects", where x is some value less than, say, 10. Or, "Some smart people talked about it, and here's what they said you should do". Books written under those models can, and should, be questioned.

I think the important question is not, "do we have experiments?", but, "do we have evidence obtained in a rigourous manner from a large number of projects?". It was the latter that I tried to get across in my own post on this subject (even though I unwisely used the word "Experiments" in the title :-)

By the way, Amr, I haven't read your own book yet. What kinds of projects did you study? Those that were successfully doing XP? Those that were succceeding (and failing) with other processes too (e.g. other recognised agile processes, ad-hoc processes, non-agile, etc)?

Re: A more effective approach - studying real-life projects by Amr Elssamadisy


think the important question is not, "do we have experiments?", but, "do we have evidence obtained in a rigourous manner from a large number of projects?". It was the latter that I tried to get across in my own post on this subject (even though I unwisely used the word "Experiments" in the title :-)


Agreed! I was just trying to say that people should take a closer look before citing work :)


By the way, Amr, I haven't read your own book yet. What kinds of projects did you study? Those that were successfully doing XP? Those that were succceeding (and failing) with other processes too (e.g. other recognised agile processes, ad-hoc processes, non-agile, etc)?


Since you asked...

This is based many different sources:

1) My own experience with Agile practices since late 1999 on the very first XP project at ThoughtWorks. So that gives me about 8 years of Agile practice with many different teams.

2) Some academic work I did at UMass Amherst with the LASER group (lab. for advanced software eng. research) where I did some research on the feasibility of experiments while looking for non-anecdotal evidence of the effectiveness of agile methods. (This is when I first took a critical look at existing research.)

3) 2 years of pattern mining with others in the field at different workshops and conferences including 2 ChiliPLoPs, 2 XP Conferences, a PLoP conference, an XPDay conference, and many informal conversations with friends and colleagues in the field.

Finally - I'm working on an expanded version with Addison Wesley which should be out (hopefully) in Q1 2008.

:)

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

10 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT