Bob Martin, co-author of the Agile Manifesto, has published a blog outlining the pitfalls of writing tests and code which have a co-variant structure. In essence, he emphasizes that the structure of tests should be designed in a contra-variant way, decoupling them from production code and leading to a less fragile and easier to refactor codebase.
When somebody starts out with TDD, one issue they often run into is the fragile test problem. Martin explains that this is when test code is so tightly coupled to production code that it is almost impossible to refactor anything without having to also rewrite the tests. He emphasizes:
The structure of the tests must not reflect the structure of the production code because that much coupling makes the system fragile and obstructs refactoring. Rather, the structure of the tests must be independently designed so as to minimize the coupling to the production code.
To further explain this fragile test problem, Martin points out that it’s often a result of not understanding what refactoring is: "Refactoring is defined as a sequence of small changes that keep the tests passing at all times". By coupling tests to the production code and not focusing on testing its behavior, refactoring stops being possible.
This covariant structure also stems from a misunderstanding of TDD, writes Martin. People often think that there should be one test class per class, whereas, in reality, they should have their own structure. After all, it’s the behavior of an application that should be tested, not its code structure.
Martin explains how whilst to begin with there may be a direct mapping between classes and tests, the two will naturally diverge as development takes place. The first example given is one where code is extracted from public methods into private methods. This leaves the same test coverage of the application's behavior but introduces new methods which are not directly tested. When these are then extracted out into classes the same principle applies; new test classes do not need to be created as everything is still tested through the original tests.
Over time, whilst the development continues, more and more tests are added, each testing a specific piece of the application's behavior, writes Martin. As this full specification is built up, the application code naturally becomes more generic in order to accommodate for all the required behaviors. In other words, production code should always become more generalized, whilst the test code should become more specific. This leads to decoupling, thus contra-variance:
As the tests get more specific, the production code gets more generic. The two streams of code move in opposite directions along the generality axis until no new failing test can be written.
The full blog is available to read online here.