Test Driven Development and the Trouble with Legacy Code

| by Mark Levison Follow 0 Followers on Nov 19, 2009. Estimated reading time: 3 minutes |

Complex MazeAllan Baljeu was trying to TDD with a legacy C++ code base, he was running into trouble because:

we end up with classes that don't fully implement the functionality that's eventually needed, and when others come around to use those classes, and eventually fuller implementations are required, then it turns out that the original design is not adequate, a new design is required, some expectations (tests) need to change and previous uses of the class need to be updated.

He wondered if Big Design Up Front would help solve the problem. George Dinwiddie, Agile Coach, suggested that Alan’s design was trying to tell him something. You have to pay attention to the fundamentals of clean code. You can look at basic coupling and cohesion (i.e. SOLID).

Mike “Geepaw” Hill, Agile Coach, says that in his years of coaching agile teams, one of the following has been at the root of these problems:

  • team is not yet up to speed on refactoring, so your classes aren't really
  • team is not yet skilled at simplicity, so ditto
  • team is not yet doing aggressive & rapid microtesting (aka unit testing), so changes break tests too often
  • team doesn't know how to handle cross-team or company-to-public dependencies, e.g. shipping api's
  • team neither pairing nor open workspacing, dramatically slowing team-wide understanding.
  • team likely has no jiggle-less build
  • team could be using tools from the '40s

Keith Ray, XP Coach, suggests that with legacy code (i.e. systems with high technical debt) the cost of repaying technical debt dominates the cost of implementing a story. He goes on to offer an approach:

To make the code more well-factored (paying down the technical debt), whenever you need to integrate a new feature into it, you should pay close attention to code smells in both the new code and the old code and consider refactoring to deal with each smell as you recognize it.

You can do refactorings in small safe steps (even in C++) manually. Very closely follow the instructions in Fowler's book on Refactoring until you learn them by heart. Eclipse with gcc has a few refactorings that actually work: Extract Method and Rename. Rename understands scope, so it is safer than search-and-replace. Extract Method and the other refactorings in Ecipse might be buggy, so be careful when you use them. For things like changing a function signature, "lean on the compiler" to show where changes have to be made.

You also need tests to make sure the refactorings are not damaging the existing features. Feather's book on working with legacy code has lots of techniques for adding tests to legacy code. On a higher level, code smells are violations of good design principles. For example, the Single Responsibility Principle (SRP) says there should one purpose for every class / method / module. There are principles about coupling and cohesion and managing dependencies, etc. It's often easier to detect a code smell than it is to apply these abstract principles. "Large Class" and "Large Method" are remedied by "Extract Class" and "Extract Method/Move Method", though knowing SRP helps in deciding what parts of a class or method should be extracted.

Perhaps the most important design principle is "Tell, don't ask": keep functionality and data together.... bad code often has the functionality in one place, and gets the data it needs from other places, creating problems with dependencies and lack of locality -- symptomized by "adding a new feature requires changing lots of code". The code smells "Shotgun Surgery", "Feature Envy", "Long Parameter List" are applicable here.

Getting fast feedback will allow more refactoring, which will (eventually) allow faster development of new features. Try to get parallel builds happening (distributed compilation). Try to get smaller source files and smaller header files. Reduce the complexity of header files - use forward declarations, avoid inline code, try to keep only one class per header file / source file. Using the "pimpl" idiom widely can decrease compile time by 10%, but it can also disguise the "Large Class" and "Feature Envy" code smells.

The advantage of refactoring instead of rewriting, is that you always have working code. If your manual and automated tests are good, then you should be able to ship the code, even if it is a half-way state between a bad design and a good design.

Keith also wrote “Refactoring: Small Steps Guaranteed to Help You Clean Up Your Code” an article on refactoring C++ code in Better Software Magazine.

Previously on InfoQ: Dealing with Legacy Code, Uncle Bob On The Applicability Of TDD and Making TDD Stick: Problems and Solutions for Adopters

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

good blog - another reference by Alan Shalloway

Good post but can be very dangerous to refactor legacy code without automated acceptance tests as there is no assurance you are not breaking things. Typically, first step to lowering technical debt is writing a better acceptance test set. Also, Working Effectively With Legacy Code by Michael Feathers is a must in these situations.

Re: good blog - another reference by Mark Levison

Allan - we're agreed. FWIW I did reference Micheal's book in Keith's quote. Perhaps I should make the reference a bit more explicit.

Mark Levison
The Agile Consortium.

Re: good blog - another reference by Alan Shalloway

Ha! Hit my unconscious apparently, not my conscious. Yes, good rec.

Re: good blog - another reference by C Curl

Having done 10+ legacy rewrites, typically, the first step is to chuck code out. Legacy codebases are like engine compartments that have been filled with old junk, replacement parts that were never installed, broken parts that were never taken out, parts that don't even belong.

As bug-fixes, releases, upgrades etc. have come and gone, these systems have accumulated lots of code that just isn't necessary, for example for:
1. fear that one just might need that code again at some point in the future, in the method that's no longer called
2. have tried doing the same thing in several different ways (e.g xml parsing, property configuration, dependency injection, ejbs/pojos, etc.)
3. poor algorithms, not using the available api's properly, new methods in newer versions of apis
4. left over code to handle dependencies on systems that have long since been removed
5. massive, home-grown build system
and so on

Your IDE can usually help you, if not then grep is your friend. But you still need to go through the laborious task of identifying this code, hypothesize which code might be obsolete, verify that there are no references to it, that it doesn't do anything useful, then chuck it.

Re: good blog - another reference by Mark Levison

C Curl - we have similar habits, but its painful and slow. The worst part if your colleagues aren't helping you they will be writing more obsolete code too. Personally I like Micheal Feathers defn of legacy code: Any code checked into the repository without unit tests.

Mark Levison
The Agile Consortium

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

5 Discuss