Quality Code - Software Testing Principles, Practices, and Patterns, a book authored by Stephen Vance, covers the different aspects of the software development lifecycle with focus on delivering quality products. In the book, Stephen discusses the practices for supporting software craftsmanship testing. He talks about design techniques like separating intent from implementation with simple code examples. Some of the testing principles discussed in the book include the following:
- Verify Intent over Implementation
- Prefer Minimal, Fresh, Transient Fixtures
- Write Small Tests
- Separate Your Concerns
Stephen also covered topics like testability patterns and testing parallelism techniques to verify conditions like race conditions or dead locks.
InfoQ spoke with the author about the book and the best practices for testing application code.
InfoQ: First, can you define for our readers the term "Quality Code"? It may mean different things to different audiences. What is quality code?
Vance: In a sentence, I would say that quality code is code that, in order of importance, does what it is supposed to do, is bug free, and is well-crafted. Think of it as code that is ready for today, tomorrow, and next year. Code that does what it is supposed to satisfies the business and the user. Code that is bug free tries to stand apart from the imperfect world and handles things gracefully when it inevitably interacts with an imperfect world. Code that is well-crafted can be fixed, modified, and enhanced far into the future, hopefully breaking the cycle of the value-sucking rewrite that traditionally happens every few years.
With the assumption that you know what your code is supposed to do, I try to tackle issues of how to faithfully turn that into code. Specifically, I want to make sure that developers know what testable code looks like and that they can build a proficiency in the implementation patterns of testing code.
InfoQ: Can you talk a little bit about “Intent of Code”? If IDE tools did provide a feature to manage the Intent of Code, how should it work?
Vance: You noted my reference to the mythical Intention Checker in the introduction to Chapter 2, huh?
All too often we write our code and then go about determining if it really does what we wanted it to do. We were not fully intentional about writing our code. It may not be egregiously off the mark, but it certainly has bugs and most likely has lots of superfluous elements. Intentional coding is about knowing exactly why you are writing every statement and ensuring that every statement serves the purpose.
I focus on this a lot in the book because really a test should be a statement of intent. Adopting that sensibility helps you refine your testing skills and produce code that does what it is supposed to do.
As a result, I’m a huge fan of test-driven development. It’s not about the tests. It’s about writing a test that expresses your intent to drive your design.
As for the tools, if I had the answer, I would like to think the whole software engineering world would know. Turning intent into code really is the heart of the craft of programming. We’ve played with requirements, traceability, graphical programming, software factories, formal logic, generative frameworks, and more and still haven’t gotten very far representing semantics. We’re slowly developing higher and higher level constructs that bring us farther and farther from the metal, but the level of breakdown necessary to write software is so far from the granularity in which most people live that it’s still a specialized and error-prone process.
InfoQ: What are some considerations and gotchas of coding for error handling in applications?
Vance: After getting the basic functionality right, how your software handles errors is the single biggest determiner of its reputation. I’ve read that it can take four to ten positive reviews to balance one bad review when people are evaluating whether to buy or use something.
So why is it that error handling is often addressed as an afterthought, having the least design attention and test coverage? Why do so many user-visible errors manifest as cryptic or useless messages? An element of the answers is that the error cases weren’t approached intentionally or tested.
At the code level, even languages like Java that have the ability to declare exceptions don’t require you to declare all of them. Java’s unchecked exceptions, which developers have been increasingly using exclusively, allow you to omit any explicit indication of your intent. Many languages don’t even support declaration of errors. Most libraries and frameworks do a poor job of documenting their errors which means that even if you want to cover it well in your software, you are dependent on your ability to reverse engineer the tools you use.
InfoQ: You discussed different testing aspects such as State and Behavioral Testing. Can you talk more about these and how the developers should take advantage of the different testing techniques?
Vance: State testing looks at the data changes resulting from the actions in your software. The classic examples are the return value that is completely determined by the values of the input parameters or the changes in attribute values after executing a method on an object.
Behavioral testing looks at what gets executed when you exercise your software, such as the methods of other objects or calls out to services. For example, in a method whose sole purpose is to call other methods in the right order and with the right arguments, you would verify that those calls were executed as expected.
Some functions may be pure examples of one or the other, but most functions have elements of both. Treating either as an exclusive approach to testing can inhibit or even hurt you. You may not have access to all state and focusing exclusively on state may miss critical behaviors. Focusing on behavior runs the risk of coupling your tests to implementation details and over-building your software to support that style of verification.
Guiding your testing intentionally and using the testing technique that best expresses your intent and least couples you to the implementation lets you use the best of both worlds. The distinction should fuel your mental model, not polarize your use of that model.
InfoQ: What is the balance of creating enough test methods v. going overboard?
Vance: There are a lot of dimensions to that question. One aspect is whether you are writing redundant tests. Another is the response time for the testing feedback loop, i.e., whether you have so many tests your suite doesn’t run fast enough. Yet another depends on the level of the tests, such as unit/isolation, integration, API, and system.
The dimension I’ll focus on, the one I think you’re asking about and that is central to the testing discussion in so many venues, is tuning your investment. Just phrasing it that way telegraphs that it’s an “It depends” answer.
If you’re test driving, then the answer is different than if you’re simply trying to get test coverage, at least for the first stage of the development. TDD is far more of an approach to software creation than about software testing. It’s a disciplined approach to building the quality in by expressing your intent in a test before writing the code and incrementally evolving your software.
From a test coverage perspective, it’s all about risk analysis. Say you’re in a startup trying to find its business model. How much do you want to invest in bullet-proofing software that you have a good chance of throwing out next week? On the other hand, can you get the software to sufficient quality that the bugs don’t scare away your potential customers without tests? If you developed your software without many tests and you decide you have found your business model, do you rewrite the software with a more disciplined approach, try to backfill the missing coverage to bring it to a more sustainable position, or forge on and hope you did a good enough job? Personal and company risk tolerance need to guide you.
As another example, what if you’ve inherited a software project that has no tests, but also requires little change or is going to be replaced shortly? You probably don’t want to invest in bringing the system completely under test. There may be some areas of critical function or higher change that warrant it, however.
Your typical software that supports an ongoing business model that you hope to continue to grow in the coming years requires tests. Anecdotally, I’ve found that unit tests don’t start to pay off until around 50% statement coverage and you don’t see strong benefits until 70-80% statement coverage. I’ve run testing regimens for high-reliability and safety-critical software where 100% statement, branch and condition coverage were only milestones to thorough testing, but the consequences of system failures were significant.
Part of the unspoken thesis for the book is that testing to high coverage isn’t nearly as difficult as is generally thought. In fact, an early proposed title was “High Coverage Unit Testing.” The obstacle many face is understanding the mechanics of testability.
InfoQ: You discussed Software Archaeology in the last chapter. Can you talk more about it and how it helps software quality?
Vance: When you don’t have tests or documentation, you have to reverse engineer intent to bring software under test. You will run into times where your ability to reason about the software fails you. Sometimes it’s a limitation of your skill, insight, or concentration. Commonly, it’s due to vestigial code, changing contexts, obsolete behaviors, strange workarounds, and so forth.
In the section you mention, I couldn’t figure out where an exception was generated. The code did some things that it documented as hacks based on the contents of the exception message. I could have injected an exception that seemed to meet the expected patterns, but I would have been testing the implementation, not the intent.
Research uncovered that the strange handling went with a function called from an earlier version that had long ago been refactored out. The exception code had been copied with it, but the original copy had also been left in place. As a result, there was unnecessary and confusing code, and it was hard to test. All of these are software quality anti-patterns that needed some archaeology to clear up.
The ability to research the evolution of code over time helps us make the code lean, clean, and testable when we need to backfill the tests.
Stephen also mentioned the following about how the limitations of testing are usually due to poor design rather than problems with testability itself.
“One of my favorite quotes comes from Henry Ford: “Whether you think you can, or you think you can’t – you’re right.” This applies to testing, as well. As I show in Chapter 13, even one of the hardest problems, that of reproducing race conditions, can be tamed in a large swath of situations. Difficulty in testing is more likely a smell that the code being tested has problems than an inherent problem with testability.”
About the Book Author
Stephen Vance has served in almost every software development-related role over the last 20 years. He has tackled problems including virtual reality, industrial robotics, Internet infrastructure, enterprise commerce and software as a service across several industries. He has consulted, trained and presented internationally on software development process and configuration management. He is currently a Lean/Agile Software Development Coach in Boston