InfoQ Homepage News Experiment Driven Development - The Post-Agile Way

Experiment Driven Development - The Post-Agile Way

This item in japanese

Feb 25, 2010 12 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

TDD and BDD are now widely-used software development techniques. However, solely following BDD & TDD may still lead to missed business opportunities, or worse, a negative impact to the business. Two questions which TDD & BDD are unable to answer are: How do you measure the usage of your application? How do you get feedback from your customers?

The traditional method of conducting user surveys is not always convincing, may be time-consuming for both the provider and the client, and can often suffer from bias. Nathaniel Talbott's original idea in his presentation at RubyConf 2009 was that the business should gain feedback in a similar way as how it is done with TDD in development:

(Image from Labnotes)

A major issue in software development is identifying the right problem to solve in order to avoid a "Waste Producing Machine". For this you need a way to measure facts instead of opinions (and egos) - then, you will be better able to measure usage of your application in production:

TDD is about design and verification of code. EDD checks out whether the business works by tracking goals.

EDD frameworks are based on A/B testing, which has an origin as a marketing method in which a baseline control sample is compared to a variety of single-variable test samples in order to determine which of two choices will improve response rates.

Assaf Arkin, the author of Vanity (an EDD framework), describes EDD:

EDD is fact-based software development. It’s the opposite of developing software out of opinions backed by anecdotes based on stories CEO tell about their cousin in-law. EDD starts out with ideas and then measures them against real people: your customers, visitors to your site, dog-food eaters. It seeks proof, and it’s iterative.
Where TDD and BDD offer tools that help you improve code quality and make sure you code behaves according to spec, EDD helps you find out what features to develop and where to take them: it helps you discover what will become the spec.

With the Rails plugin Vanity, A/B testing can be conducted by following five steps:

Defining an A/B Test:

# experiments/price_options.rb
ab_test "Price options" do
  description "Mirror, mirror on the wall, who's the better price of all?"
  alternatives 19, 25, 29
  metrics :signup
end

Presenting the different options to users:

<h2>Get started for only $<%= ab_test :price_options %> a month!</h2>

Measuring conversion using the track! method:

class SignupController < ApplicationController
  def signup
    @account = Account.new(params[:account])
    if @account.save
      track! :signup
      redirect_to @acccount
    else
      render action: :offer
    end
  end
end

Generating a report:
(Image from Labnotes)
Measuring the efficiency of the forms by observing the collected data:
(Image from Labnotes)

InfoQ spoke with Assaf Arkin to learn more about EDD:

InfoQ: How did you come up with EDD? After playing around with A/Bingo for your new project apartly? Through Nathaniel's presentation?

Nathaniel came up with EDD, I just had too much coffee.

A few weeks before RubyConf, Nathaniel stopped in SF and we met for lunch. Back then I was in the middle of setting up a couple of experiments for Apartly. I used A/Bingo for the split testing, Google Analytics for some metrics, database queries for others.

Imagine data coming from three different places that I have to stitch together to create reports. Not easy because Google Analytics doesn’t know about A/B tests, A/B tests don’t now about Google Analytics. There’s also the issue of testing code paths for each experiment. How do you test your experiments and metrics?

Over coffee, Nathaniel gave me the elevator speech for his upcoming presentation. It sounded like what I was doing, except he had the methodology, while I was supergluing and duct taping.

Then he got to the meat. EDD is a conceptual framework, a way of thinking about, building, measuring and refining code through experiments. What if you had an actual framework that did all the heavy lifting, so you can write experiments with a few lines of code? I heard it and immediately had this nagging voice in my brain going “WANT!”

Since the voice in my head didn’t go away, I ended up writing a framework in the short time I had until RubyConf. It had to be a minimum viable EDD framework, with reasonably good documentation, and used in production for a few days. I don’t expect people to use my code unless I trust it in production.

That’s the story of how Vanity came to be.

InfoQ: What Drives your Development the most nowadays? Test? Behavior? Experiment? Do the ratios evolve with the project maturity?

I use EDD and TDD together, can’t imagine using one without the other.

We’re a small startup with a lot of ambition and a lot of ideas. It’s going to take a while to distill these ideas into the perfect product and a viable market. Some of our early intuition will prove right, some will need fine tuning, and some ideas will prove off-base. That’s the nature of startups, the key to success is iterating fast enough to find the perfect product/market fit before you run out of money.

So you code like a ninja. Minimum effort to put something out there, so you can test out whether your intuition was right, validate your ideas.

We don’t have time to over-engineer code for all the “just in case” scenarios. Most likely six months from now we’ll make some change in direction, because the market tells us to, and all of a sudden we’re dumping features we no longer need. That’s when you’re thankful you didn’t over-develop them.

The other aspect of being lean is getting rid of excess inventory. We can’t afford to have crap code and dead features weighing us down. We remove features just as quickly as we add them.

Keeping light on our feet gives us freedom to experiment with different ideas, because there are less consequences to getting any of them wrong. If you’re not doing heavy up-front development or committing to years of maintenance on each feature, you can easily try out different things. If one experiment doesn’t work you throw it away and try something else.

To make that happen code has to be easy to change, easy to troubleshoot, and reliable enough to deploy continuously. You need a good test suite watching your back. Our code-to-test ratio right now is 1:3.4. We’ve got unit, functional and integration tests running continuously in the background thanks to the wonderful Autotest, a continuous integration server to weed out “works on my machine” bugs, and staging server to tackle deployment issues.

So these lines of defense prevent us from unleashing broken code on the world. That’s the role TDD plays in controlling code quality, and enabling us to experiment with new ideas and make rapid changes to the product. But how do you decide which changes to make? Which ideas are worth pursuing?

You start with intuition, but intuition is never enough. When you read about startups in the NYT it always sounds like the founders had one good idea and immediately struck gold. It’s a feel good story. In practice, founders come up with countless ideas, most of which are blah, a few are okay, and only the good ones get remembered. Successful startups listen to the market and follow on the few winning ideas.

We develop in iterations, but our iterations are not about completing the next feature or advancing down the feature list. The iterations are about trying something out, learning a bit more about our customers, and using that knowledge to decide what will happen in the next iteration. Our progress is measured by what Eric Ries calls “validated learning”.

What features does your audience care about the most? What changes would make them happier? What features nobody cares about and you can safely remove? If there are two ways of doing the same thing, which would you pick?

EDD is a way to answer these questions. It tests how well an idea works in practice by putting it out there and measuring responses. EDD and TDD are complementary. Without TDD we wouldn’t be able to do just-in-time development, rapid iterations and run multiple experiments. Without EDD, we could end up developing high-quality bullet-proof code that nobody cares about. EDD helps us find the secret ingredient to an awesome software.

InfoQ: Do you think EDD can be adapted to any kind of software development? Do you think it can be applied to over-engineered Java solutions or to applications like the ones you were previously developing at Intalio? Isn't it targeting only highly-visited websites?

There’s an impression that A/B testing is about funnels and conversions and landing pages. Marketing people have used statistical approaches for years to analyze and segment markets. They brought these practices with them to the Web, to the point where you can’t escape reading about A/B testing and landing page optimization.

There’s more to A/B testing than sign up forms. In fact, most of your experiments have nothing to do with landing pages or marketing.

As a software developer, our job goes beyond building features and making sure they work. It’s about building features that matter, that people use and benefit from. If you’re a software developer, do you just build to build, or do you build for an audience? Would you write code you know no one will ever bother to use, or rather write something that matters to many people?

Many of us have side projects because it lets us own one piece of software where we are fully responsible and accountable for every decision we make.

I can summarize our development process at Apartly with one question: “will it move the needle?”

There are several metrics we’re interested in, like signups (acquisitions), invites (referrals), subscriptions (revenues) and so forth. Those are the needles. Everything we do with the limited resources we’ve got must make some difference on one of these metrics. Maybe we’ll get more signups, maybe more returning visits, maybe more high five’s on Twitter.

Not only does everyone have access to these metrics, but they’re built into the development process. Metrics and experiments are part of the code base, they’re checked into source control, they’re run through testing and staging.

Just like test-first has you writing a test that fails, then the code necessary to make the test pass, you can start with a baseline metric and write code that will make that metric change in a favorable direction. (Generally up, but some metrics like WTFs/minute you want to drive down)

Can you apply that outside of high-traffic Web sites? There’s always a metric that makes a difference. When you replace synchronous interaction with one that goes through message queue, does it improve response time? Decrease number of times server crumbles under load? Make it easier to deploy new services? In short, is there any measurable effect, or was it wasted effort that looked good on paper?

What happens when you don’t have any way of measuring results is that a lot of development starts because the problem sounds interesting, and continues because there’s no way to prove it’s ineffective. And you already have sunk costs. Once you introduce metrics, the definition of what’s interesting changes. All of a sudden it’s interesting because you can see a difference.

No. 2 on Nathaniel’s list of requirements for an EDD framework is “accessible at all levels”. Not just user-facing views. You want to measure at every layer of your software stack. And every components needs to be accountable, prove its worth.

InfoQ: It's hard to measure ROI for TDD and BDD, and results may not appear immediately (and certainly not in realtime). At the end it's not always easy to convince managers and decision makers of the benefits of TDD or BDD. Is it possible that, by providing direct and quantifiable data, EDD will be favoured by managers?

Sadly, I don’t think EDD will make your PHB less pointy, less hairy or less bossy.

IT departments, and by extension many business and enterprise software companies, have a monopoly on the user base. When you don’t have to listen to users, you sometimes develop a culture around clients: the person or title who hands you a check. The client is always right, and the client has no tolerance for any changes they didn’t instigate, so they expect to see a roadmap and to see continuous progress from 0% to 100% done. The classic waterfall approach.

At the other end you’ve got companies and projects where you don’t have the luxury of forcing your software on people. You have to court your users, delight them, make their life better in some way. Your measure of success is not how many features you completed by end of fiscal year, but how many people you won over.

When you measure success in features delivered, your guiding star is the carefully crafted Gantt chart. Blow it up and post it across the largest wall in the office. What do you do when your measure of success is customer delight? Have marketing circulate their quarterly board reports to the entire company? That’s a 3 month latency between delivery and feedback.

Some companies design software around an imaginary user archetype they invent during a role-playing exercise. Other companies already figured out the proverbial “eat your own dog food”, some only develop software they also use internally. But what if your market is not software developers?

If you understand how software works, you know the futility of measuring productivity by lines of code. Measuring productivity by features or story points is not much better. In either case you’re measuring and optimizing for second order effect. Productivity tells you nothing interesting.

Instead, pick the business metrics that matter most. Dave McClure dubbed them “metrics for pirates”: Acquisition, Activation, Retention, Referral and Revenue. Get the smartest people in your company, that includes the development team, to figure out how to improve acquisition, up retention, increase revenues.

It’s not just having quantifiable return on business. If quantifiable returns have to trickle from marketing to dev managers down to team leaders, then you have a bandwidth and latency issue. It’s about putting developers directly on the front line, baking metrics into the process, and measuring key business goals.

I call that “post Agile” because it build on the great foundation of Agile, but displaces “working software as primary measure of progress” with “validated learning and key metrics”. EDD is to post-Agile what TDD is to Agile.

Will EDD be the norm in a few years? Does it provide tangible benefits, or is it just another *DD acronym? What do you think?

InfoQ Software Architects' Newsletter

Experiment Driven Development - The Post-Agile Way

Write for InfoQ

Rate this Article

This content is in the Agile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter