BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Dealing with Legacy Code

Dealing with Legacy Code

Bookmarks

Legacy Code is Inevitable

There will be many times in our careers when we support legacy code. Sometimes you accept a new job and the legacy code is your first assignment, or maybe your company reorganizes and this product ends up in your lap. For whatever reason, it happens. You wanted to code something new and shiny but instead you are now the owner of a new-to-you and completely unfamiliar block of code. The code appears to be rather intricate, is unfamiliar and you now have to wade in.

In fact, if I can stretch the definition a bit, you can consider any code written before today to be legacy code. Have you ever tried to revisit code that you wrote six months ago? It's not always as easy as you'd hope to support your own code, much less someone else's. Both situations are challenging if you don't follow some basic guidelines.

The traditional approach is to start making changes while doing your best to avoid unintentional collateral damage. Unfortunately, because the code is unfamiliar, you aren't sure what's really going to happen when you change a data structure or update a variable.

Rather than wandering into this minefield blindly, let's create a plan of attack. Don't just start making changes and hope that everything still works. Instead, take aim and hit it out of the park with a "BAT".

Here's a three-pronged attack you can use to attack the problem. Build, automate, and test. Use this BAT for your legacy code and create a safety net for yourself. The BAT approach will ensure that your code continues to work the way you want it to work. It quickly catches unintended side effects and helps you to eliminate them.

I'd like to challenge you to look at how you handle your legacy code in light of the BAT approach. See how your day-to-day work compares and see if you need to approach your work differently.

Build

The first problem to address is the build. It's difficult to ship and test a product unless you can reliably build it. Figure out how to cleanly build the product on your desktop and then script the process.

Occasionally this is a non-issue, but builds usually aren't nearly as clean as they should be. Often builds are limited to a single machine or a special environment. When teams pass code from owner to owner, it tends to accumulate odd build requirements. Each owner adds his or her own special case into the mix. There have been too many cooks in the kitchen by the time you inherit the mess.

A complicated build can trigger cascading problems to the entire product.

When a thing is difficult, people do it less often. When a build is difficult, people build less often. It's just human nature. The ability to run a clean build often becomes a dark art mastered by only a few people in your shop. No one wants the task because it's so difficult and painful.

Since you can't test what you haven't built, testing becomes less frequent. When people finally run their tests, they find more bugs... infrequent testing gives bugs more time to accumulate. If you are running tests daily, you'll only have one day's worth of bugs to report. However, if you wait six months to test, you'll have a lot more issues to pin down.

So your testing becomes burdensome. The testers get tired of all the work in a testing cycle so they try to avoid testing. Entering dozens or hundreds of bugs is boring work that no one enjoys.

Developers start to dread the testing cycle because they feel bombarded and attacked by the all the bug reports. So the developers start resenting and harassing the testers, which makes the testing cycle every more painful. It's a very destructive feedback loop.

The complicated build causes problems for the entire product lifecycle, so be sure your build is clean.

When anyone can build, anyone can test. Testing is run more frequently, leading to smaller groups of bug reports. Having less work to do at a time is less of a chore. Anyone will move a bucket of paint and not think twice, but ask someone to move five hundred buckets and see what they say.

Your goal is to create a clean build that runs on any development machine and is easy to maintain. Use a build scripting tool or language, like Rake, Ant, Maven or Nant. These high-level build languages let you focus on building your application, instead of build, language, or platform details.

When you can build your product with a single command (like "ant all"), you can move on to the next step. Be sure to test this on more than one machine.

Automate It

Now that you can build the product on an arbitrary machine automatically, let's automate.

The goal is to automate the entire build and test cycle on a clean machine with an absolute minimum of human intervention. It's not always possible to have everything completely automated but we want to reach a place where we script everything that can be reasonably automated.

Sometimes it's easier to install and configure a supporting piece of the software than to write a script to do it automatically. Applications that you install only once are prime candidates. Things like compilers, runtime libraries, and pre-existing data sets fall into this category. Don't try to reproduce the data set that takes you two hours to recreate. However, if you can rebuild a representative data set in thirty seconds, you should build from scratch. The benefits of starting with a clean, known data set are immense, but not always practical. Don't stretch a fifteen-minute test run into an hour by rebuilding all of your data.

Be sure to document any manual steps thoroughly and keep all the instructions around for anyone else that might want to duplicate the environment.

On the other hand, why should you sit around and watch complete builds all day long? Your time is valuable. Your IDE probably handles incremental builds and small unit test runs for you anyway. In most cases, this partial coverage is good enough. Having developers run a set of Smoke Tests, targeting active code areas, will cover most situations.

Smoke Tests

A Smoke Test is a short collection of tests that target the areas of the code that are actively being changed. Your Smoke Tests don't even try to exercise the entire product. A good set of Smoke Tests will be rotated... they aren't permanent. When you starting working on a different product area, move out the old Smoke Tests and cycle in others. You can select tests from your complete testing suite tests to run in your Smoke Test suite. I usually just add them to an Ant target (called smoke-test) that runs selected tests.

We still need a clean build and complete test run periodically. This is how we verify that no one forgot to commit a changed file or broke the product in an unexpected way. Smoke Testing often misses these types of breaks, so to keep the product in the best shape possible, we need to run the entire suite fairly frequently.

Instead of asking every developer to build the entire system from scratch and run every test available five times a day, tasks that can take quite a while, we're going to ask another computer to do that for you. Since we're tasking a computer to perform the automated build and test run, there's no reason we it can't run more than once a week. In fact, there's no reason this cycle can't run after every code change.

What's the best way to set up this type of automation? The quickest and easiest way is to use a Continuous Integration product. A CI product watches your code, builds after each change, runs your tests, and then notifies everyone involved.

Continuous Integration graphic

A CI system creates a fast feedback loop. When you change your code, you'll find out if anything was broken before you forget why you made the changes. (For more information on CI systems, visit my CI page at http://www.jaredrichardson.net/ci.html).

It's all about the pace of your development and keeping you moving. Having to revisit code edits from last week or last month is a poor way to keep things rolling. Catch and fix your problems within the hour and keep your team moving forward.

Here's how the system works. You edit the code on your desktop until you're sure you've got the feature completed or the bug fixed, so you put your changes into your source code management (SCM) system. Your Continuous Integration system is monitoring your SCM and sees that code has changed. It then checks out the product, builds it and runs all your tests. You'll immediately get an email telling you whether or not the compilation and test run passed or failed.

The concept is fairly simple but it's very powerful in action.

Test

The final section of our BAT is test automation. Now that our CI system is building the product and running our automated tests, we need to be sure that our tests cover the product properly. After all, a test that doesn't test the functionality the customer uses is pretty useless from the customer's point of view.

First, try to understand how your product is used... this can often be a challenge all by itself with a legacy product. You may even want to create a set of scenarios for customer usage. For example, you could have a scenario for creating daily reports, doing daily data imports, or adding new customers.

You may also wish to have categories of users, called "user personas". You could have "Joe the Power User", "Mary the System Administrator", or "Fred the New User". Each persona uses your product differently just like a power user uses the product differently than a rank novice.

Next create Mock Client Tests to duplicate the most common customer usage scenarios.

Mock Client Tests

A Mock Client Test isn't a special testing framework. It's a test created to ensure your basic, expected functionality doesn't break. Quite often with legacy code, you'll change something in one part of the system and not know that it affects other areas of the product. I once worked on a product where changing the communication protocol affected the GUI component layout. Your Mock Client Tests, inside your CI system, are your insurance policy against accidental change. Test your product the way that you expect it to be used and you'll have a solid product baseline. Add tests to cover the more interesting cases later, as you encounter them.

A great testing strategy is Defect Driven Testing. Every time you find a bug in the system, add a new test that covers that defect. While you are adding the specific test, look for other tests you can add that are close but not quite the same. Over time, this strategy provides good coverage in the areas of the product that need coverage the most.

Regardless of how you choose to add tests, make test addition a priority. Having a basic test suite in place is essential if you plan to make any changes to the product.

Your final step is getting the tests into your Continuous Integrations system.

You get feedback very quickly on any problems when your automated tests run in your Continuous Integration environment. Every time you change the code you get a "from scratch" build and a complete test run. Most developers get addicted to this additional coverage very quickly and soon depend on this "extra team member."

  • Write Scenarios
  • Create Mock Client Tests
  • Use Continuous Integration

Summary

Build, automate and test (BAT) is good advice for anyone writing code, but it's an especially good formula for anyone inheriting legacy code. The ability to refactor with confidence is essential. I find it very difficult to be productive if I'm constantly looking over my shoulder to see what I'm breaking. A good test suite looks over my shoulder for me and let's me focus on the performance improvements I'm trying to make.

Remember, never change legacy code until you can test it, and never test purley by hand unless you have no other option.

Don't fear legacy code but handle it properly. Hit it with this BAT and you'll win every time.

About the author

Jared Richardson, co-author of Ship It! A Practical Guide to Successful Software Projects, is a speaker and independent consultant who specializes in using off-the-shelf technologies to solve tough problems.  With more than 10 years of experience, Jared has been a consultant, developer, tester, and manager, including Director of Development at several companies.  Until recently he managed a team of developers and testers at SAS Institute, Inc., and deployed a Continuous Integration System for nearly 300 projects, 5 million lines of code, and over 1,800 developers.  He also led a company-wide effort to increase the use of test automation.  Jared can be found online at http://www.JaredRichardson.net.

Click here to read an exclusive sample chapter of Ship It! A Practical Guide to Successful Software Projects, available only on InfoQ!

Rate this Article

Adoption
Style

BT