Understanding Legacy Code with Characterization Testing

Alberto Savoia has written a series of four articles describing characterization tests, which are essentially unit tests added after the fact to help describe and document legacy code. The name "characterization testing" was coined by Micheal Feathers in his book "Working Effectively with Legacy Code", as described in Alberto's first article:

Michael Feathers defines characterization tests as tests that characterize the actual behavior of a piece of code. In other words, they don’t check what the code is supposed to do, as specification tests do, but what the code actually and currently does... Having a set of characterization tests helps developers working with legacy code because they can run those tests after modifying their code and make sure that their modification did not cause any unintended or unwanted changes in functionality somewhere else.

But where does a developer begin - and end - writing characterization tests?

Michael Feathers provides the following heuristics for writing characterization tests:

Write tests for the area where you will make your changes. Write as many test cases as you feel you need to understand the behavior of the code.

After doing this, take a look at the specific things you are going to change, and attempt to write tests for those.

If you are attempting to extract or move some functionality, write tests that verify the existence and connection of those behaviors on a case-by-case basis. Verify that you are exercising the code that you are going to move and that it is connected properly. Exercise conversions.

In his second article, Alberto serves up a piece of confusing legacy code, and shows how he uses characterization tests to unravel its meaning:

It’s hard to fully understand what’s actually going to happen by just looking at the code; you’d have to simulate the various paths of the code’s execution in your head, keep track of variable values, etc. – tough things to do for anything but the most trivial code. Fortunately, as you will see, with characterization testing we don’t have to do that. We don’t look at the code to gain complete understanding of what it does; we look at the code for clues and suggestions on what to test.

In the third article, Alberto uses his sample characterization tests to fix a bug, and to refactor the code to make it more easily understood. Also, Alberto alludes to the realization that using code coverage tools can be especially helpful with characterization testing, in that they reveal whether or not the characterization tests are in fact "characterize" the system - if the characterization tests are missing paths through the legacy code, then either the code is dead, or the characterization tests aren't truly describing all of the existing behavior.

It may seem that writing a base set of characterization tests could be done automatically, and Alberto has created a tool which does just that - JUnit Factory, an online characterization test generator (Eclipse plug-in available). In his fourth and final article, Alberto uses JUnit Factory to generate characterization tests for the example code, and then compares it with those tests which he wrote himself. The conclusion? Automatic generation of characterization tests provides developers with an excellent starting point for understanding and working with legacy code.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Agile topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter