Approval testing is a test technique which compares the current output of your code with an "approved" version. The approved version is created by initially examining the test output and approving the result. You can revisit the approved version and easily update it when the requirements change. Approval testing is supported by TextTest, an open source tool for text-based functional testing.
Emily Bache, a Trainer, Software Developer and Architect, gave a workshop about approval testing with TextTest at the European Testing Conference 2017. InfoQ is covering this conference with Q&As, summaries and articles.
Bache started her workshop by explaining the concept of approval based testing. She showed how you can set up a test in TextTest and run it for the first time to generate output data.
The workshop attendees practiced approval based testing with TextTest, defining new test cases and running them. They checked the output that was generated the first time a test was run to see if the program worked correctly. If the output was ok, then they approved it which marked the test as "passed".
If the output was not ok, then they had found a defect. Instead of approving the output, the tester can correct it or leave a note in the output for the developer, and then approve it, said Bache. TextTest will then mark the test as "failed" since the generated output is different from the approved output. When the program is updated and the bug removed, the test can be run again and then it will pass.
InfoQ spoke with Emily Bache about the approval testing technique and asked her how TextTest can be used for approval testing.
InfoQ: What is approval testing?
Emily Bache: The basic idea is that you determine whether your test has passed by comparing the current output of your code with an "approved" version gathered from a previous test run.
So the first time you run a new test, you don’t have anything approved to compare against. You probably know roughly what to expect, perhaps you have a sketch, or some notes from a user story conversation to remind you what should happen. You examine the output from your code by hand and determine whether it is good enough. You might do some calculations or show it to an expert user if you’re unsure. Once you decide the behaviour is worth keeping, you "approve" the output and store it. Then all subsequent runs compare the actual output against this approved version. Often you use a simple textual diff tool to do the comparison, and any difference will fail the test.
If you look at something like the "Minesweeper" Code Kata, the problem is defined in terms of ascii art input (a rectangular field of mines) and expected output (the same rectangular field but with both mines and adjacent mine counts). This works very well with an approval-based approach, since each test case comprises an input minefield and approved output solution. When the test fails, a diff generally shows you clearly what is wrong. In fact, I’ve made it into a little approval testing exercise and released it publicly on github: Minesweeper Approval Kata.
You can design some new minefields (perhaps genuine corner-cases?) and turn them into approval tests. I’ve done something similar with the Yatzy and Gilded Rose katas too, to show the output doesn’t have to be graphical like this, the same approach works.
InfoQ: How does approval testing differ from assertion-based testing?
Bache: In an "assertion-based" test you have to pick an aspect of the output to check, and write code specifically to check it. You often talk about a test having three parts - arrange, act, assert. In approval testing the first two parts are largely the same; it’s the third part that is altered. The test designer still decides what scenario to test, and how to trigger it, but doesn’t define up front what the output should be, at least not in any detail. As soon as you have the first two parts of the test set up, you can start running it and evaluating the output. If the output is incomplete or sketchy you can keep working on the production code until the output looks to be worth keeping. When you’re happy enough with it to "approve" the result and share the test with your team, it’s not necessarily the end of the story. The test is inherently "agile" in that you can revisit the approved version and easily update it when the requirements change.
InfoQ: How can you use TextTest in approval testing?
Bache: TextTest is an open source tool (http://texttest.org) that I and others have been using and developing for quite some years now. It’s designed to be language-agnostic, in that it tests at the level of the executable program rather than individual functions or classes. For example I’ve used it to test programs written in Java, Scala, C++, Python and Ruby.
As well as testing whole programs, TextTest also expects to be comparing plain text output. Of course not all programs produce plain text as part of their normal functioning, but most produce something that can be converted to plain text. For example, pdf or html documents can be converted to plain text, complex user interfaces can be rendered in ascii art, databases can be queried to produce plain text reports. In my experience it’s worth the effort to create a test harness for your application that does this kind of conversion, and it will probably be re-used in many test cases. Once your tests consist largely of a directory structure of files containing plain text you have a host of power tools at your disposal - search, diff, regular expressions and version control to name a few.
InfoQ: When would you do approval testing and which are the benefits?
Bache: I usually use approval testing for end-to-end or API tests, where you want to confirm a whole feature works as the user expects. I think it follows naturally from the need to "approve" output - you want the test to be based around something a user or domain expert can read and understand. If you can show them the test output in a tool they’re familiar with, like a pdf document, a webpage, or a screenshot of the GUI, then they are more likely to trust the test is checking something they care about. An assertion-based test is often much harder for them to relate to.
The other thing about assertion-based test is that it involves writing new assertion code for every new test case. With approval testing you can often re-use the test harness you wrote for the previous test case. Overall I find there is less code to maintain.
I often use this approach with legacy code, that is, code that is still used but which lacks test coverage and is hard to change without breaking it. Just by triggering different scenarios and approving whatever the program does as correct, you can quickly get a suite of regression tests in place.
Those are all useful aspects, but really the "killer feature" of approval testing is the ability to find defects you didn’t anticipate when you designed the test. With an assertion-based test you have to explicitly decide some aspects of the output to check. You assert that the square in the top left corner is adjacent to two mines. That the invoice shows the payable amount, the bank account number and the customer number. But what if the minefield is missing the last column? Or the invoice includes a stack trace instead of a delivery address? Will you remember to write an assertion for that?
By default, approval testing will diff the whole output, and in my experience has a better chance of finding these kinds of issues. Actually what often happens is that there are parts of the output that you don’t want to check. You might be displaying today’s date, a process id, or random number. These things change all the time, and will need to be filtered out before you do the comparison against the approved version. But still, the role of the test designer is changed - instead of picking out a few things to assert on, you pick out a few things to ignore. Overall you’re much more likely to catch those unanticipated changes that would otherwise ruin your day.
InfoQ: Where can testers go if they want to learn more about approval testing?
Bache: Well firstly I don’t think this is a technique just for testers. I think all developers benefit from testing their code, and you can use this technique to work in a test-driven way while developing new functionality. In my experience you get more benefits from a test suite like this if developers, testers and expert users/product owners work together on it.
I think an excellent way to learn a new technique is to try it out on a problem you’re already familiar with. If you’ve previously worked on any of Minesweeper, Yatzy or Gilded Rose Code Katas, I do encourage you to take a look at the approval testing version of them on my github page: Yatzy Approval Kata, GildedRose Approval Kata. Once you’ve had a go, you can compare your tests with my "sample solution" version of each. I plan to add more such exercises there- do drop me a line if you have a particular favourite Code Kata you’d like to see solved this way.
There are several articles and even scientific papers available, if you’d like to read more about the background, concrete experiences and theory behind the approach. For example, I’ve written a chapter about approval testing in my (work in progress) book, Mocks, Fakes and Stubs; it’s included in the free sample you can download. An academic did a study of a project I was involved with and the results are published in this paper on the industrial applicability of TextTest. I have an article on my blog about approval testing which describes how other people also use this kind of approach to testing. Llewellyn Falco in particular has produced a lot of material on the topic, including screencasts and videos, see Using ApprovalTests in .Net. You can also find articles and user documentation on the texttest website.