BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles If You Want to Deliver Fast, Your Tests Have the Last Word

If You Want to Deliver Fast, Your Tests Have the Last Word

Key Takeaways

  • A good testing strategy is not only critical for ensuring that code changes are safe, but also to deliver fast, reducing MTTR, and improving the developer experience.
  • Good testing strategies are especially important for teams that develop in iterations, work in environments with high uncertainty, or with frequently changing requirements.
  • Shifting from the concept of “unit” as “class or method” to “small unit of functionality” or “small module” can reduce the amount of time needed for implementing changes.
  • End-to-end tests are very expensive. Their development and maintenance require a lot of effort. They also often lead to flaky results and slow build times.
  • Adopting changes requires changing habits, and this is not easy. Willpower does not always work because we have something similar to an immune system for changes.

The problem of well established practices

I think you will agree that software engineering is special in comparison to other professions. Things change drastically and quickly. It requires a lot of brain power just to stay up to date.

Maybe as a consequence of that, we hold on to some well-established general practices or ideas (even if they cause us trouble or don’t fit some cases). Those practices try to cover most cases, but cannot cover all of them. However, these practices give us comfort. We need to have something that doesn’t change, that feels safe and that frees our mind from the burden of thinking whether or not it actually fits. We enter autopilot mode.

The problem with that is that we want software development to behave like an assembly line: once the assembly line is built, we never touch it. We operate in the same way all the time. That may work with our CI/CD lanes for a while, but sadly it doesn’t always work well with our code.

It even gets worse because sometimes the message is transmitted so many times that it loses its essence and at some point, we take that practice as part of our identity, we defend it, and we don’t let different points of view in. Especially if they apparently require more effort. Other times we just want to fit in and don’t want to suggest new ideas.

When we code, we need to fight against that and reflect in every case whether the practice fits the current scenario. Think of "best practices" as "best generic practices".

One example of that is the many ways that agile can be misinterpreted. The essence has been lost in some cases.

In that piece, I argue that the essence of agile is often lost because many times the implementation of agile focuses on the wrong things. By definition, something agile can easily change direction and respond quickly to changes.

We try to achieve this responsiveness with practices of different natures: technical, such as CI/CD (Continuous Integration/Continuous Deployment), and strategic, such as developing in iterations. However, we often forget about agility when we deal with the core of Software Development: coding. Imagine preparing your favorite meal or dessert without the main ingredient of the recipe. This is what we are doing when we strive for agility without considering the code.

That may happen because improving code sounds scary, and complicated or it could be easy to get into rabbit holes (all of these can be eased away). Maybe it is just because it is not easy to see the negative effect some solutions have on our maneuverability; converting future developments into a nightmare: the opposite of agility. Instead of focusing on the code, too much attention is put on reaching perfection in our processes (of methodologies such as Scrum), which are less important and try to solve the problem without tackling the main issue.

Finally, I suggest that we need to give more visibility to the effect that the code has on future developments, (and as a consequence, the future of the business). Hopefully, AI can help us quantify this with something like a coefficient that would, not just tell us the quality, but also predict how much slower development will be based on our potential choices. I think something like this could help companies realize that they need to invest in sustainable development. The discussions about when to tackle technical debt would become history.

In this piece, I am going to focus on one coding practice that is often not questioned enough and plays a vital role in agility: the conventional way of testing.

I will also introduce "Immunity to change", a strategy that I recently discovered that can help you achieve goals, and change many habits; not only coding habits but also other habits in your life. In addition, it illustrates one of the points I mentioned in my previous article: we want to achieve maneuverability by changing some practices/habits but we unconsciously sabotage ourselves because we are ignorant of the role that coding plays in achieving that goal.

Tests - Safety Net or Straitjacket

Hopefully, we agree that (good) automated tests bring us a huge advantage: we can change code and verify quickly that no existing functionality breaks. Tests are our safety net.

However, they are not cheap, so the costs need to be compensated, and this only happens after some time. The more times it runs, the more value we get out of it. But if we modify the test, the "benefit counter" for the test resets to zero.

In addition to compensating the costs, we need to consider another thing: when we create a test, we do our best to guarantee that the test is correct, but we cannot be 100% sure. If we could be sure that any code we write is correct, we wouldn't even need tests in the first place, right?

Tests only give us confidence over time. Consider the following: imagine every time a test runs, it brings us a point of confidence. If it runs 1000 times, we win 1000 points of confidence.

If at some point we discover a bug in the test, we lose the 1000 points of confidence. We thought it was protecting us, but it wasn’t.

In a similar way, if we need to refactor the logic verified by the test and the test breaks, we cannot be sure that either the business logic or the test are correct anymore. We also lose the "confidence" points we accumulated. It is like we are building a new safety net.

It is an investment in the future. Humans and companies rarely think of long-term investments, but thankfully, the benefits of automated tests are widely accepted (though from time to time I hear stories of somebody that still doesn’t embrace automated testing).

Other long-term investments like keeping the code flexible through refactorings are sadly not as widely adopted. Hopefully, the not-at-all-innovative ideas mentioned in this article help remove barriers to that.

Every time I add new functionality, I’m grateful for the tests that have been written, especially in big services with lots of functionality. I cannot imagine having to verify every single one of them manually. It would be nonsense, considering how much slower I would be.

However, they can also bring disadvantages if we don’t follow the proper strategy for our case. An inappropriate testing strategy can slow down delivery, and deteriorate developer happiness and experience.

Let’s find them out together with the following questions:

Do all your unit tests verify classes or methods in isolation?

If the answer is yes, and you are heavily mocking dependencies, you might be tired of preparing one mock after the other, to find out in some cases that the mock wasn’t real enough and the logic is actually not working as it should. Hopefully, you find that out before going to production and you don’t have to make many changes.

Do you sometimes have the impression that tests restrict the way you can change the code?

I'd guess this is a yes too. Sometimes the existing code doesn’t fit anymore for the new functionality or would get overcomplicated. You decide to refactor it. It takes 10 or 15 mins because it is a small change and BOOOOM, many tests don’t compile anymore, or fail. Adapting the tests and doing it takes an unreasonable amount of time, much longer than adapting the code. You would end up spending days for a 10 to 15-minute change. WHY???!!!! The behavior didn’t change.

If the feature is not changed, ideally tests shouldn’t break. If you adapt the tests, can you jump on the safety net again? And remember that the "benefit counter" and the "confidence counter" for the tests are reset when you modify them.

We’ll see later that in many cases we can avoid this situation, and avoid adding hacks. Because hacks give us the illusion that we deliver fast, at least initially. But this is a short time investment and will hit us back: the code becomes rigid and after some months, we will spend much more time understanding the code and changing it. Even after some days, we won't remember what the code does. And this will get worse and worse, not even thinking of the poor new hires who have even less context. However, if your code base is littered with hacks, you won't need to worry too long about them, as they probably won’t stay long.

Does your test suite have lots of integration and e2e (end-to-end) tests?

Integration tests tend to survive more changes than unit tests, but they are much slower.

If you have developed many e2e tests, you have probably gone through a painful experience:

  • It can be really slow to develop them and to find out what the issue is when they fail.
  • Sometimes, just configuring and starting up the environment is a headache.
  • They tend to be very flaky. Sometimes it is a network error, sometimes a browser update, etc.
  • Some of the components are owned by other teams or companies. Any temporary issues they are having will affect you even if you didn’t do anything.
  • Running the test suite to verify changes goes from seconds to minutes or even hours as the codebase grows.

The pain becomes worse if the e2e tests involve many components connected through the network (but please don’t build a Big Ball of Mud to avoid this).

So a bad test strategy on its own can ruin your delivery. Unit tests can "prevent us" from writing better code. It doesn’t matter if your "agile" rituals are on point. You feel like having a straitjacket (or more than one, if we get paranoid adding more and more tests, trying to be 200% safe). In the meanwhile, the sharp teeth of the competitors, who can adapt faster to changes in the market’s ocean, are getting closer to us, and the straightjacket won’t let you even move. Neither will they be of help against those teeth.

So, some key questions I would like you to ask yourself after reading this:

  • Testing class by class, method by method. Does it always make sense? What alternatives do we have which can help us to change code easier and faster?
  • Integration and e2e tests. When do they make sense?

An example

The conventional way

Let’s look at a simplification of a scenario I often see. A backend app implemented using Spring. I typically see a 1-1 mapping between production and test classes, like in the picture:

[Click on the image to view full-size]

The "class with the main logic" is often called a "service". Sometimes the service is specific per domain entity, sometimes the service represents a more abstract concept.

Sometimes the "main logic" also spreads to framework components such as Controllers or Listeners. Not only the line between the framework and the business logic becomes blurry, but also the line that distinguishes what can be better tested in unit tests or integration tests. As a result, I sometimes see similar scenarios in unit tests and integration tests. This is useless and costly.

Having one unit test per class makes sense when the classes contain complex functionality. When that happens, identifying errors is difficult, so having small pieces of code helps to locate issues faster. This is actually one of the arguments that support unit tests. Keep in mind that I’m saying "complex functionality" and not "complex logic" because it is always possible to implement simple functionality in a complex way.

The problem

Now imagine you need to add a feature or change the logic because the requirements changed. That rarely happens, right? ;). The current implementation doesn’t fit the new idea or context. As almost always there are two options: dirty hack or refactoring. We of course always choose refactoring:

[Click on the image to view full-size]

Imagine the split was clean (just move some methods to the new class) and the old class keeps a reference to the new class. Does it make sense to split the test class in order to keep the 1-1 mapping? What do we win? We could just keep the test as it is (only a slight change is needed).

In the case that the refactor was more complex, imagine that the logic (implementation details) was adapted in 20 mins. As we already mentioned, tests are not so quick to change.

Regarding the tests we see two things:

  • The unit tests broke because they are too tied to implementation details. And sadly we often need to spend a lot of time reworking the tests. We already spoke about the consequences of that.
  • Integration tests resisted the refactoring. We might think: "If we only had integration tests we would have saved a lot of time". But integration tests are slow to develop and run, so maybe we even lose more time, and the build pipeline becomes slower, which affects delivery time and our ability to recover from incidents.

We see each type has some benefits. So maybe we can combine the best of integration and conventional unit tests. Traditionally a unit is a class or a method. At least this is how I learned it a long time ago. It may seem like the idea of not testing a class in isolation is wrong. But maybe the concept of "unit" could benefit from its own "refactoring".

For a long time, I wondered why I didn’t see people talking about this. After some time I found some articles and videos talking about the topic and I got a clearer picture of this and other concepts such as the different types of test doubles when I read the book Unit Testing Principles, Practices, and Patterns by Vladimir Khorikov.

So if it is not a class, what is a unit? What about a piece of functionality, spread across several methods or classes? You might say, "wait, that sounds like an integration test." Well, not exactly. I will borrow some definitions from the book:

A unit test:

  • Verifies a small piece of code
  • Does it quickly
  • And does it in an isolated manner

An integration test doesn’t meet at least one of the mentioned criteria.

A good unit test:

  • Protects against regressions
  • Is resistant to refactoring
  • Provides fast feedback
  • Is easy to maintain

So the concept of "unit" now seems more abstract and more flexible. Also, under integration tests, we can consider a lot of things such as system tests, e2e tests, etc.

A More Flexible Way

So, let’s go to a higher granularity level. Let’s group the initial logic in a module and do some more changes:

[Click on the image to view full-size]

The first thing we did is group the domain logic in a relatively small module. Not sure if this will help but think of modules as a microservice inside a microservice. How small? As with almost every decision, there is no single rule to follow. Keep in mind that the example is a simplification: there might be several entities involved or none, and there could be more classes around.

It is important to keep in mind the trade-offs:

  • We saw that when testing small pieces of code, flexibility suffers.
  • On the other hand, if it is too big, the test will be very complicated to implement, and errors will be more difficult to find. Ideally, the module represents a use case or a small piece of functionality. If that involves many classes, maybe you could find another subdivision that makes sense.

Advantages:

  • The code is more flexible. In the more frequent case, rearranging things within the module, won’t require changing tests.
  • Having small cohesive pieces is easier to understand than one big chaotic thing. It is the same idea as with microservices versus the big ball of mud.
  • More things are tested in the unit test, so fewer integration tests are needed. This gives us:
    • Faster feedback and shorter build times.
    • The resulting unit tests are easier to debug and more stable than integration tests.
    • Clearer barrier on what to test in integration tests. The business logic is exclusively tested in unit tests.
    • Less processing is required so less CO2 is emitted.
  • We save work since we don’t need to mock internal classes. Instead of mocking many classes, we mock modules and framework components, so tests don’t require the application to start.

Disadvantages:

  • It requires more thinking. Naming and grouping things, such that everything makes sense, is complicated. Hopefully, soon OpenAI, Github, or Tabnine AI tools can do this for us, but until then we need to take care of that ourselves.
  • The unit tests are a bit more complex than conventional unit tests, but as long as the module is small, it shouldn’t be a problem.
  • It is not always applicable. It is not always easy to group classes in modules with a clear separation of concerns, such as in the case of the Chain of Responsibility pattern when there is a high number of unrelated elements.
  • IDEs might not help you find the tests that easily.

Keep in mind that it might make sense to have a specific test for really complex functionality.

The second thing we did is separating the domain from the framework components.

One of the functions of frameworks like Spring is to glue different elements of an application. What we want to do here is similar to the hexagonal architecture a.k.a. ports and adapters: controllers, listeners, filters, DAOs, or other framework constructs are ports that connect the domain logic (application core) to the outside world. Those components ideally don’t contain domain knowledge. For example, controllers, listeners, and filters exclusively contain one call to the domain logic (and, if needed, another call to logic that maps the data to some format more friendly for the domain model).

  • The domain logic is the most important part of our code and we need to test it intensively, and as easily as possible.
  • We cannot forget about the framework, but the framework has been intensively tested by its developers, so it becomes secondary for us. We just need to verify that we configured it correctly.

Advantages:

  • We won’t need to prepare complex input such as bytes, JSON, or framework entities such as HttpServletRequest for unit tests.
  • Our logic is close together. The code is more cohesive and clearer without mixing the framework domain with the business domain.

An extension of separating the domain and the framework logic is to keep the domain logic for each functionality close together, to grant cohesiveness. All the logic related to this functionality is implemented in this module, not spread around several modules.

I’ve already experienced a nightmare, needing to dig through the codebase of a monolith to find all the places that need to change. We debugged intensively for months, and the main change was done in two days.

Application level integration tests

We just need integration tests for anything that is not tested in unit tests such as other framework functionality: endpoint configuration, serialization, deserialization of data and errors, data access, remote calls, auth. A smoke test for the happy case might also be interesting.

End-to-End Tests

The last point I want to mention is regarding e2e tests. e2e are WAY MORE expensive than integration tests, so use them with care:

  • For critical scenarios:
    • Is it life-or-death?
    • Are payments involved?
    • Does the company lose 100K € of revenue or risk significant reputation damage if something is broken for five mins?
    • Could PII (Personal Identifiable Information) be leaked?
    • Other significant reasons.
  • For other cases, maybe just a test for the happy path is enough.
  • In general, internal applications won’t probably need them.

An alternative to e2e tests for those non-critical scenarios could be to use contract-driven tests.

A Strategy for Changing Habits

I can remember several cases in which this approach would have helped me. The most significant one was a time in which we had to develop an algorithm for a bidding system.

The first implementation of the algorithm initially looked good but we started discovering tricky scenarios every week after the release. We started fixing them individually and the implementation soon became too complex and we ended up having an incident.

After that, we went over it again and found a different approach. We ended up with 75% less code and the implementation was way simpler. That was the good part.

The sad part is that the unit tests checked many methods individually and we had a bunch of them. If we had tested the logic as a unit of functionality, we would have saved 12 days of work (out of 15) and a lot of frustration caused by going over all the tests again.

So, on top of all the mentioned advantages, this approach to testing can increase your happiness as a developer/team. Other strategies which aim to deliver faster require convincing people outside your nearest environment, and that can be very difficult. This one is all in your hands.

In summary:

  • Make sure that the test suite is fast.
  • Make sure that changes in code require as few changes as possible in tests.
  • Favor simpler and faster tests: Unit over Integration.
  • Keep related domain logic close together and out of the framework to help testing and reduce the blast radius of future changes.
  • e2e tests only for business-critical features.

And keep in mind that:

  • Every test requires effort to be developed and maintained but not every test adds value. It is better to not write a test at all than to write a bad test.
  • It is impossible to make sure that the code is 100% safe. Tests help us reduce bugs but in some cases, we need to live with that uncertainty.

I’m not saying that testing in this way is good for everybody and all the time. You need to evaluate if this could help you and in which cases. As I mentioned, it can especially be helpful in agile environments, where there is a lot of experimentation, the domain is created and evolved in iterations, and new functionalities are regularly incorporated. Many of those actions would benefit from refactorings.

Some of these advantages can be also achieved with TDD (test-driven development) if we do it consciously. Sadly, once again, we tend to focus too much on the tool or technique without understanding the essence, and what it was originally meant to achieve, so we get nothing out of it and we end up abandoning it.

Immunity to Change

Adopting practices like this one requires changing habits and we know how difficult this can be. We think that willpower is what we need, but it is not enough. I want to share an approach I recently discovered. It is described in the book Immunity to Change: How to Overcome It and Unlock the Potential in Yourself and Your Organization, by Lisa Lahey, Ed.D. and Robert Kegan, Ph.D., members of the Harvard Graduate School of Education.

We often try to implement changes through willpower alone. Their theory is that this may not work because we have something similar to an "immune system" which works against changes, sabotaging all our attempts. So to adopt a change, we first need to uncover that "immune system" and work on its roots.

This "immune system" was developed in the past. Over time, we try to reach our goals. To achieve them, we follow a series of steps:

  1. We use reason to find a way to achieve them and make some assumptions (our view of the world).
  2. Based on those reasons and assumptions, we adopt some habits that will help us reach those goals. The most amazing thing is that we may not even be aware of them!
  3. Once we reach the goals, we keep the habits even if they don’t help us anymore.

Those concepts form the base of the process they implemented. I’ll describe the steps very briefly:

  1. Build a table with four columns.
  2. In the first column, list the goals you are trying to achieve, how important they are, and why they are that important.
  3. In the second column, list the things that you are doing or not doing which are preventing you from reaching the goals mentioned in the first column.
  4. In the third column, list the reasons or commitments that explain why you are doing the things in the second column.
  5. In the fourth column, list the assumptions that you made, which link the reasons in column three with the habits in column two.
  6. The last point is about deciding if the assumptions are still valid and, if not, correcting them.

If we find that the assumptions are not valid anymore, we work on them. That makes it easier to adopt the change.

In the podcast "Dare to Lead", Lisa Lahey and Brené Brown put Immunity to change in practice through an example from Brené’s own life. It has two parts (part one and part two). I often listen to podcasts while doing other tasks, but this was worth listening to very attentively.

The podcast starts with a powerful question from Brené: "Why do we all want to transform and no one wants to change?". They continue talking about how we tend to associate failure to change with not wanting it enough or fake intentions. They mention that intentions are not everything, because people whose lives are at risk and who want to live, still sometimes fail to change.

Later they work on one example, which is probably familiar to us: Brené says that she wants to get more disciplined with the team about having regular meetings.

Lisa guides Brené with some questions to help her fill in the columns. For Brené it is a key to making her life easier and to achieve success. In spite of that, and that it is something she herself can change on her own, she was not able to implement the change.

Brené then gets surprised when she discovers the unvoiced commitments, assumptions, and worries that are sabotaging her and led to the current situation. She says that she learned in the past that discipline and creativity are mutually exclusive and thus she believes that with regular meetings she will lose time for what she most enjoys: time to be creative. So she skips those meetings, but she ends up having many one-off meetings as a consequence. Because she wants to show that she is an accessible leader, she says. This has helped her to reach many goals, but now she spends too much time with that and is realizing that having regular meetings won’t make it worse, but could save her a lot of time.

In the end, Lisa suggests that she needs to find ways to validate this new belief that discipline and creativity are indeed compatible. The goal is to create new neural pathways and override the old ones that have been carved for years.

I think we can benefit from this strategy in two ways: this sabotage from our "immune system" illustrates what we are doing when trying to be agile while also ignoring how our code looks. I believe this can be applied in many aspects of our lives:

If for example, we take the case of switching to this new testing strategy as an example, we would need to find out what we want to achieve with the change, and why it is more important than other things. For me, in my current context, flexibility is critical.

Regarding what we might be doing that works against the goal, why we do it, and the assumptions we made that fire up those activities, we could think that our life could be easier if we do it in the usual way because we just need to continue by inertia. We also find reasons not to do it: "This is the way we’ve always done testing, and how we were taught to do it. Everybody is doing it in this way, so it has to be correct".

So to fight that, we need to voice the negative side effects and rewrite those beliefs, then find a way to prove that the assumptions are wrong, and finally implement the change.

If you want more details, you can check out the resources linked in the above-mentioned Dare to lead podcast (part one and part two), their website, or their book.

About the Author

Rate this Article

Adoption
Style

BT