InfoQ Homepage Articles Full Stack Testing: Balancing Unit and End-to-End Tests

Full Stack Testing: Balancing Unit and End-to-End Tests

Mar 30, 2016 15 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

The ethos of being a full stack developer is the ability to deliver and ship a feature end-to-end. That includes testing. Tutorials and books often focus on the plumbing needed to set up full stack development and get testing working (mine brings together Angular, Rails, Bootstrap, and Postgres). What’s often missing is guidance on how to approach testing applications across the entire web development stack. Let’s dig into that in this article. We’ll learn how to get the most out of end-to-end tests, including guidance around what to test and how to keep those tests reliable and maintainable. We’ll also touch on unit tests and the role they play in our end-to-end testing strategy. But first, let’s understand the purpose of writing tests at all.

At their core, tests make sure your application is doing what you intend it to do. They are an automated script to execute your code and check that it did what you expected. The better they are, the more you can rely on them to gate your deployments. Where your tests are weak, you either need a QA team or you ship buggy software (both mean your users get value at a much slower pace than is ideal). Where your tests are strong, you can ship confidently and quickly, without approvals or slow, manual processes like QA.

You must also balance the future maintainability of the tests you write. Your application will change and thus so will your tests. Ideally, your tests only have to change proportionally to the change you are making in your software. If you are making a change in an error message, you don’t want to have to rewrite a lot of your test suite. But, if you are completely changing a user flow, it reasonable to expect to rewrite a lot of tests.

Practically speaking, this means you can’t do all your testing as end-to-end full-blown integration tests, but you also can’t do it as nothing but tiny little unit tests. This is about how to strike that balance.

Types of Tests

There are myriad types of tests, but for our purposes here, let’s talk about two: end-to-end and unit.

End-to-End Tests simulate user behavior. In a web application, they will start the server, fire up a browser, click around, and assert that certain things happening in the browser give us confidence our feature is working. These tests give great confidence, but they are slow, brittle, and tightly coupled to the user interface.

Unit Tests exercise units of code according to their public API. These tests involve creating an instance of a class and calling methods on it with specific inputs. You assert that the methods you called had the desired effect (typically that they returned expected outputs). These tests are fast, stable, and are not tightly coupled to many other parts of the system. They do not, however, give you confidence the overall system is working—just that the unit of code under test is working.

Your job building a feature is to find the right balance between these two tests. If you have too many end-to-end tests, future changes to your application will be painful and slow. If you have too few, subtle bugs will creep through to production, despite a fast test suite with 100% code coverage.

Start with the User Experience

Your software is in service to some user, so it’s that user who should drive your work. I would not recommend using tests to design a user experience, so figure out how the user will use the software before writing tests (either by experimental coding or working with a designer). Once you have that, start working.

Ideally, you’ll create an end-to-end test for some part of the user experience, and write code to make it pass. While writing that code, you’ll create unit tests to flesh out the specifics of the code you need to create or modify (and it’s typically the latter).

The problem is that it’s difficult to write a failing end-to-end test with no user interface artifacts (HTML) to reference. The reason is that the form of most end-to-end tests are:

Find some element on the page
Interact with it in some way
Verify that interaction worked
Repeat until end of test

This means you need some specifics around the user interface elements (DOM Objects) you’ll need to interact with. When you factor in interaction design powered by JavaScript, it’s even more difficult to do without actually having the interface at least partially built.

To deal with this, get a rough outline of the UI working in the browser. Use canned data, and don’t worry about alternate flows—focus on one thing at a time. When you get that working, write a test.

In doing this, there’s two things to consider: should the feature even be tested and, if so, how?

Should You Test It?

Although there is no happy path in programming, the user will experience many fewer paths through your code than are possible. For example, when a user purchases a product, we might have different ways of handling fulfillment based on the user’s address, chosen shipping method, or previous purchase history. The user experience is the same in all these cases, so this is only one flow from the user’s perspective.

Your goal, then, is to test all user flows. You want a suite of tests that simulate a user doing what you want and expect users to do, and to assert that all the experiences you want the user to have are working properly.

Given that you know what to test, how should you go about it?

How to End-to-End Test

If you are modifying a flow, modify the test of that flow. Since an end-to-end test simulates user activity, you don’t need one test for each thing you want to assert. If the user should see three important pieces of information on a checkout screen, you don’t need three tests—one test that checks all three is sufficient. So, when modifying an existing user experience, look for an existing test you can enhance.

Otherwise, you’ll need a new test. Remember, your goal is to simulate what the user would do. Be honest about how you structure the navigation and behavior in your test. Would the user really navigate directly to some deep link? Or would they click around from some common start page to get where they need to go?

It’s hard to do this, especially using the typically minimal markup needed to implement the feature. Your test needs to locate particular DOM elements to interact with, and it’s not always easy (or possible) to find the precise one you want. You need signposts.

A signpost is something you insert into the DOM specifically to locate elements of interest. As early as possible, decide on how those signposts will work. You should not use CSS classes intended for styling to locate DOM elements. Doing so means your front-end developer will break your tests by changing class names. You should also not use CSS classes or data attributes in use by the JavaScript code (e.g. a js- prefixes class). These fall victim to the same thing.

Two common techniques are to use test- prefixed CSS classes or data-test- prefixed attributes:

<section class="component dark test-checkout-confirmation">
  <!-- ... -->
</section>

<!-- OR -->

<section class="component dark" data-test-checkout-confirmation>
  <!-- ... -->
</section>

This might seem icky and…it is. But, it’s less icky than having to couple your tests to the content or presentational classes. You need to strike a balance here—don’t mindlessly tag every element with a data-test attribute. Usually, you need just a little context in which you can find elements. For example, if you want to click a button for purchasing a particular product, you really just need to locate some element that contains that product and its purchase button.

<article data-test-product="1234">
  <!-- a ton of markup -->
  <input type="submit" name="Purchase" value="Purchase">
</article>
<article data-test-product="5678">
  <!-- a ton of markup -->
  <input type="submit" name="Purchase" value="Purchase">
</article>

With the addition of the data-test-product attribute, you could locate the purchase button for product 1234 by using a CSS selector like [data-test-product='1234'] input[type='submit'].

This means you have to make changes to your markup that only exist to afford testing, which means your user is downloading bytes they don’t need to get the user experience you are providing. That’s a trade-off, but it’s better than having poor test coverage (which hurts the user far more than a few extra bytes in the HTML). Just be judicious.

This technique is even more important when your page has interactions on it that change things without reloading, namely with JavaScript.

Dealing with Interaction

When every click reloads the page, end-to-end tests are more reliable, because the underlying tools know to wait for a page to reload. When user interaction simply changes the DOM, it’s harder, because there’s no obvious way to “wait for stuff to be done happening”—the tools don’t know what “stuff” is happening.

When your test needs to interact with a page that isn’t getting reloaded on user actions, you need a way to wait for the DOM manipulation to complete before you start asserting what happened. If you don’t wait, the DOM won’t be updated when your test starts asserting and your test will fail unnecessarily.

Just like we used signposts in our markup to locate DOM elements to manipulate, we can use them here, too. Any new or changed markup should have some sort of signpost that won’t be present if that interaction failed or didn’t happen. In other words, you should not have to make sleep calls in your tests to wait for DOM events—your DOM should have signposts your tests can wait for explicitly.

For example, suppose we want to test that an action generates a success message to the user. Suppose the way it’s implemented is to make an AJAX request and, when the call is completed, insert a message into the DOM. A basic implementation might do something like this:

function purchase(productId) {
  $.post(
    "/products/",
    { "id": productId }
    ).done(function() {
      $(".header").html(
        "<div class='alert-success'>Your order was placed</div>");
    }).fail(function() {
      $(".header").html(
        "<div class='alert-failure'>There was a problem</div>");
    });

You could configure your test to wait for an element with the CSS class of alert-success to appear, and then make an assertion about its contents. This means that your test will be flaky or break if any other element should need to be on the page with that class. While you could scope it to header, this just kicks the can down the road.

Instead, use a data-test- attribute

function purchase(productId) {
  $.post(
    "/products/",
    { "id": productId }
    ).done(function() {
      $(".header").html(
        "<div data-test-purchase-successful class='alert-success'>Your order was placed</div>");
    }).fail(function() {
      $(".header").html(
        "<div data-test-purchase-failed class='alert-failure'>There was a problem</div>");
    });

Although this adds more bytes to your markup, it allows you to write a reliable test that can survive some visual changes. As long as the page’s flow is to display a message after a successful purchase, the visual implementation can change without breaking your test. This is what you want, and it’s a trade-off. You could sacrifice this confidence by creating the smallest most minimal markup possible, but then you either waste time fixing tests when visuals change, have to have manual QA, or you just ship software you haven’t tested thoroughly.

Modern end-to-end testing tools like Capybara include functions for everything you need. There are methods to wait for DOM elements to appear before proceeding, assert the content of particular parts of the page, and interact with form elements. Most other web application stacks provide similar tools. In any case, you can couple your testing library with a headless browser like PhantomJS, and your end-to-end tests will be surprisingly fast and reliable.

It’s also worth mentioning how to do this in a distributed world.

When there is more than one “app”

When you are working on a single, monolithic system, the above techniques are all you need. If you are working in a more distributed system, however, it’s trickier. Suppose that you are working on a customer-facing application, but it must pull inventory data from another system. How do you write a test for this?

First, remember what you are testing. Your end-to-end test is testing a user interaction. This means that your end-to-end test is not responsible for asserting the functionality of the remote services, nor is it responsible for asserting that your application is properly consuming that remote service.

The best way to test the consumption of services (and that those services do what they advertise) is to use consumer-driven contracts, which is a form of unit test (at least in the broad definition I’m using for this post).

This still leaves open the issue of how to simulate the remote service during an end-to-end test. You could stand up an actual version of that service, but this does not scale. You end up having to manage that service’s internal data store as well as the services it depends on. It’s an explosion of complexity that is difficult to manage.

A popular option is to use a mocking system at the HTTP layer. In Ruby, VCR is a tool that does this. You record your interactions with a real service to establish the HTTP protocol going back and forth and, for subsequent test runs, the mocking system plays back the recorded interaction without using the network. Given that you have test coverage in your unit tests of proper consumption of the service, this works well for end-to-end tests.

Another option is to stand up simplified mock services that return canned data. Your app will make HTTP calls as it normally would, but against a canned service that just returns static, known data to your app. This requires some up-front configuration, but can work for simple interactions with a service. If your application requires storing state in the service and has a lengthy back-and-forth “conversation”, this technique is harder.

My recommendation is to try mocking HTTP first, as that’s simpler and faster.

Now that we know what to test in an end-to-end test and how to do it, what about unit tests?

Unit Tests

Recall that our criteria for what should be tested end-to-end is user flows. The idea is that while there are many possible logical flows through the system, there are many fewer that make a difference to the user experience. Unit tests are where we test the rest of those logical flows.

This allows us to assert the correct behavior of large parts of the system quickly and reliably. In other words, while we could assert every possible flow through the system with an end-to-end test, it’s not necessary, and will be slow and brittle.

For example, suppose a checkout feature has two user flows: a successful purchase, and a failed purchase, where the user must try again. That would be two end-to-end tests. Suppose further that under the covers, there are these possibilities:

The customer’s card was charged properly.
There was a problem contacting the customer’s bank, but we want to pretend it was successful and charge later.
The customer’s card was declined.
The customer’s card is expired.

That’s four flows, and so we’d want four unit tests to assert that each of these situations is handled correctly. And yes, there will be duplicate coverage. Our end-to-end test would likely set up a successful charge and a decline to handle its two user flows, so when our unit tests are written, we’d have more coverage than we technically need.

This is, again, a trade-off, but it’s important that your classes are well-covered by unit tests. This allows them to be moved, re-purposed, and changed much more easily.

There are many, many theories on how to write unit tests, far more than we can get into here. My suggestion is that you adopt a technique that makes sense to you, is easy to explain to others, and use it consistently.

The hardest part about unit tests is deciding how much of your code’s design should account for testing. This is analogous to how we added attributes and other signposts to our HTML in order to test it—those artifacts exist only because we have to test. You’ll face the same choices writing a unit test.

For example, suppose our credit-card-charging code is implemented in a class called Purchaser. Suppose that it will use a third-party-provided AwesomePayments to do the actual charging.

class Purchaser
  def charge(purchase)
    AwesomePayments.charge(purchase.customer.id,purchase.amount)
  rescue => ex
    try_again_later(purchase.id)
  end

  # ...

end

This is clear and makes sense and, in a world without unit tests, might be the most ideal design. In order to more easily test it, however, we may want to control the instance of AwesomePayments:

class Purchaser
  def initialize(awesome_payments = AwesomePayments)
    @awesome_payments = awesome_payments
  end

  def charge(purchase)
    @awesome_payments.charge(purchase.customer.id,purchase.amount)
  rescue => ex
    try_again_later(purchase.id)
  end
end

Our tests can now pass in a fake implementation of AwesomePayments to have better control over the test. The tests have affected our design (although in only a small way here). You might even argue that this class is just better code. This won’t always be true.

I would apply the same criteria you did with end-to-end tests: do what you need to make your life easier, don’t go overboard, and be judicious.

In Conclusion

Your ability to implement a feature top to bottom hinges on your ability to test it that way, too. The feedback loops where a QA team or the customers are testing your code are terrible. Even if there is a QA team, they shouldn’t find any bugs, and if you want to ship software quickly, you won’t mind writing end-to-end tests of user behavior.

About the Author

David Copeland is a programmer and author. He recently published the book “Rails, Angular, Postgres, and Bootstrap” and is also the author of "The Senior Software Engineer" and "Build Awesome Command-Line Applications in Ruby". He has over 18 years of professional development experience from managing high-performance, high-traffic systems at LivingSocial or building the engineering team at Opower to working consulting gigs large and small. Currently, he's Director of Engineering at fashion start-up Stitch Fix, building a platform that will change the retail shopping experience.

InfoQ Software Architects' Newsletter

Full Stack Testing: Balancing Unit and End-to-End Tests

Write for InfoQ

Related Sponsors

Types of Tests

Start with the User Experience

Should You Test It?

How to End-to-End Test

Dealing with Interaction

When there is more than one “app”

Unit Tests

In Conclusion

About the Author

Rate this Article

This content is in the Web Development topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter