InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Test Driven Development and the Trouble with Legacy Code

Posted by Mark Levison on Nov 19, 2009

Sections
Process & Practices,
Architecture & Design,
Development
Topics
Agile Techniques ,
Agile ,
Programming ,
Architecture
Tags
TDD ,
Testing ,
Legacy Code

Complex MazeAllan Baljeu was trying to TDD with a legacy C++ code base, he was running into trouble because:

we end up with classes that don't fully implement the functionality that's eventually needed, and when others come around to use those classes, and eventually fuller implementations are required, then it turns out that the original design is not adequate, a new design is required, some expectations (tests) need to change and previous uses of the class need to be updated.

He wondered if Big Design Up Front would help solve the problem. George Dinwiddie, Agile Coach, suggested that Alan’s design was trying to tell him something. You have to pay attention to the fundamentals of clean code. You can look at basic coupling and cohesion (i.e. SOLID).

Mike “Geepaw” Hill, Agile Coach, says that in his years of coaching agile teams, one of the following has been at the root of these problems:

  • team is not yet up to speed on refactoring, so your classes aren't really
    minimal
  • team is not yet skilled at simplicity, so ditto
  • team is not yet doing aggressive & rapid microtesting (aka unit testing), so changes break tests too often
  • team doesn't know how to handle cross-team or company-to-public dependencies, e.g. shipping api's
  • team neither pairing nor open workspacing, dramatically slowing team-wide understanding.
  • team likely has no jiggle-less build
  • team could be using tools from the '40s

Keith Ray, XP Coach, suggests that with legacy code (i.e. systems with high technical debt) the cost of repaying technical debt dominates the cost of implementing a story. He goes on to offer an approach:

To make the code more well-factored (paying down the technical debt), whenever you need to integrate a new feature into it, you should pay close attention to code smells in both the new code and the old code and consider refactoring to deal with each smell as you recognize it.

You can do refactorings in small safe steps (even in C++) manually. Very closely follow the instructions in Fowler's book on Refactoring until you learn them by heart. Eclipse with gcc has a few refactorings that actually work: Extract Method and Rename. Rename understands scope, so it is safer than search-and-replace. Extract Method and the other refactorings in Ecipse might be buggy, so be careful when you use them. For things like changing a function signature, "lean on the compiler" to show where changes have to be made.

You also need tests to make sure the refactorings are not damaging the existing features. Feather's book on working with legacy code has lots of techniques for adding tests to legacy code. On a higher level, code smells are violations of good design principles. For example, the Single Responsibility Principle (SRP) says there should one purpose for every class / method / module. There are principles about coupling and cohesion and managing dependencies, etc. It's often easier to detect a code smell than it is to apply these abstract principles. "Large Class" and "Large Method" are remedied by "Extract Class" and "Extract Method/Move Method", though knowing SRP helps in deciding what parts of a class or method should be extracted.

Perhaps the most important design principle is "Tell, don't ask": keep functionality and data together.... bad code often has the functionality in one place, and gets the data it needs from other places, creating problems with dependencies and lack of locality -- symptomized by "adding a new feature requires changing lots of code". The code smells "Shotgun Surgery", "Feature Envy", "Long Parameter List" are applicable here.

Getting fast feedback will allow more refactoring, which will (eventually) allow faster development of new features. Try to get parallel builds happening (distributed compilation). Try to get smaller source files and smaller header files. Reduce the complexity of header files - use forward declarations, avoid inline code, try to keep only one class per header file / source file. Using the "pimpl" idiom widely can decrease compile time by 10%, but it can also disguise the "Large Class" and "Feature Envy" code smells.

The advantage of refactoring instead of rewriting, is that you always have working code. If your manual and automated tests are good, then you should be able to ship the code, even if it is a half-way state between a bad design and a good design.

Keith also wrote “Refactoring: Small Steps Guaranteed to Help You Clean Up Your Code” an article on refactoring C++ code in Better Software Magazine.

Previously on InfoQ: Dealing with Legacy Code, Uncle Bob On The Applicability Of TDD and Making TDD Stick: Problems and Solutions for Adopters

Related Sponsor

In today’s hyper-competitive world, later may be too late to adopt Agile development and this Roadmap for Success will help you get started. Download "Agile Development: A Manager's Roadmap for Success" now!

good blog - another reference by Alan Shalloway Posted
Re: good blog - another reference by Mark Levison Posted
Re: good blog - another reference by Alan Shalloway Posted
Re: good blog - another reference by C Curl Posted
Re: good blog - another reference by Mark Levison Posted
  1. Back to top

    good blog - another reference

    by Alan Shalloway

    Good post but can be very dangerous to refactor legacy code without automated acceptance tests as there is no assurance you are not breaking things. Typically, first step to lowering technical debt is writing a better acceptance test set. Also, Working Effectively With Legacy Code by Michael Feathers is a must in these situations.

  2. Back to top

    Re: good blog - another reference

    by Mark Levison

    Allan - we're agreed. FWIW I did reference Micheal's book in Keith's quote. Perhaps I should make the reference a bit more explicit.

    Cheers
    Mark Levison
    The Agile Consortium.

  3. Back to top

    Re: good blog - another reference

    by Alan Shalloway

    Ha! Hit my unconscious apparently, not my conscious. Yes, good rec.

  4. Back to top

    Re: good blog - another reference

    by C Curl

    Having done 10+ legacy rewrites, typically, the first step is to chuck code out. Legacy codebases are like engine compartments that have been filled with old junk, replacement parts that were never installed, broken parts that were never taken out, parts that don't even belong.

    As bug-fixes, releases, upgrades etc. have come and gone, these systems have accumulated lots of code that just isn't necessary, for example for:
    1. fear that one just might need that code again at some point in the future, in the method that's no longer called
    2. have tried doing the same thing in several different ways (e.g xml parsing, property configuration, dependency injection, ejbs/pojos, etc.)
    3. poor algorithms, not using the available api's properly, new methods in newer versions of apis
    4. left over code to handle dependencies on systems that have long since been removed
    5. massive, home-grown build system
    and so on

    Your IDE can usually help you, if not then grep is your friend. But you still need to go through the laborious task of identifying this code, hypothesize which code might be obsolete, verify that there are no references to it, that it doesn't do anything useful, then chuck it.

  5. Back to top

    Re: good blog - another reference

    by Mark Levison

    C Curl - we have similar habits, but its painful and slow. The worst part if your colleagues aren't helping you they will be writing more obsolete code too. Personally I like Micheal Feathers defn of legacy code: Any code checked into the repository without unit tests.

    Cheers
    Mark Levison
    The Agile Consortium

Educational Content

Jesper Boeg on Priming Kanban

In this interview, Jesper Boeg, author of the new InfoQ book – Priming Kanban, discusses the keys to using Kanban effectively, and how to get started if you are currently using other approaches.

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.