End-to-End Testing Considered Harmful: A Q&A with Steve Smith

InfoQ recently sat down with Steve Smith and discussed the ideas behind his recent blog post “End-to-End Testing Considered Harmful”. Smith talked about release testing being a form of ‘risk management theatre’, discussed the benefit of unit and acceptance testing within this context, and stressed the value of monitoring at runtime versus the typically fragile and slow-running implementation of end-to-end testing.

InfoQ: Hi Steve, thanks for talking to InfoQ today. Could you introduce yourself, and also briefly explain the premise behind your latest blog post "End-to-end Testing Considered Harmful" please?

Smith: I'm a Continuous Delivery consultant based in London, and as a developer I've done a lot of unit testing, acceptance testing, smoke testing, and monitoring over the years without resorting to a plethora of end-to-end tests. I've never been a fan of end-to-end tests as they are so slow and brittle, and over the past few years whenever a client has said "we need to do End-To-End Testing for 1 day/2 weeks/1 month before the production release it has made me very sad.

Through 2014 I wrote about a form of End-To-End Testing known as Release Testing being a form of Risk Management Theatre and this year I've gone further. When a client said in the spring "we need End-To-End Performance Testing before our production releases", I asked them why they were willing to spend their own money to test a third party system for the sake of the third party, and they relented.

When the subject resurfaced in the summer a colleague and I again pointed out what a bad idea it was, and we recommended production monitoring of third parties instead. After that incident I was so annoyed I spent the next 4 months of evenings reading and thinking about End-To-End Testing, and the final (long) version is available via my blog. Initial feedback has been good and I'm thinking of doing a talk next year about it.

InfoQ: You mention in the article that end-to-end testing will not check 'behaviours' of the system under test. Could you explain more about this, and the reasoning behind this assertion?

Smith: So for brevity I simplify build-time automated checks into unit tests, acceptance tests, and end-to-end tests, and I classify them according to Jerry Weinberg's terminology in "Perfect Software and Other Illusions". Jerry describes a difference between checking intent against implementation i.e. did the developers do what they thought they did (unit tests), and checking implementation against requirements i.e. did the developers do what was required. An acceptance test or an end-to-end test will check requirements have been met by following a particular pathway through the System Under Test, but it will not check all the different behavioural possibilities within that pathway. That is where unit tests come in.

InfoQ: You discuss that creating a fragile system that prioritises long mean time between failures (MTBF) over low mean time to recovery (MTTR) can lead to risk in regards to exposure to 'Black Swan' events. How can developers and operators convince management of the reality (and inherent risk) of this issue?

Smith: That is a good question. Nassim Nicholas Taleb has written in Black Swan and Antifragile how hard it is for people to be rational about probabilities, and the idea that "unusual things happen - usually". My approach with clients is:

1. calculate the cost per unit time of an event e.g. "if the third-party payments system failed, how much money would that cost us per minute"

2. calculate the duration of an event e.g. "if the third-party payments systems failed, for how long would it be unavailable"

3. calculate the probability of that event e.g. "how likely is the third-party payment system to fail"

4. risk = 1 * 2 * 3

People are often comfortable estimating 1 as they know their business, and 2 as they know third party lead times. 3 is more difficult, but even a rough estimate can give a clear indication - plus show people how important it is to work on shrinking their own lead time to market

InfoQ: With the rise in popularity of 'programmable infrastructure', how important do you believe it is to integrate infrastructure changes within a build pipeline? Have you got any advice on how to do this (e.g. should there be separate application/infra pipelines)?

Smith: Automated infrastructure is clearly a good thing, and it could certainly help with a low Mean Time To Repair as it would enable infrastructure errors to be quickly rolled back. I've seen automated infrastructure versions bundled up with application versions (using something like an Aggregate Artifact) or an entirely separate pipeline with its own rate of change. It depends on the extent to which the application is tied to the infrastructure.

InfoQ: Can you offer any advice for someone who is responsible for improving the quality of testing within a legacy application that contains a large amount of end-to-end tests? Is there any equivalent of the popular microservice vs legacy app 'strangler pattern' for a fragile test suite?

Smith: That is a very good question. I keynoted Continuous Delivery Conference 2015 in the Netherlands recently, and in a Q&A I was asked exactly the same question and managed to resist hiding behind the normal answer of "it depends". Replacing an enormous number of automated end-to-end tests with unit tests and acceptance tests will take a long time and will be a hard sell to a product owner, so I agree with your suggestion of the Strangler Application.

I would create a new application surrounding the old legacy application, with unit tests and acceptance tests of the new application and a gradual removal of end-to-end tests from the legacy application as it is incrementally replaced. That requires a substantial investment, but as the new application replaces the legacy application people will notice the shorter lead times and higher quality that results from a move away from End-To-End Testing.

InfoQ: Thanks for taking the time to talk to InfoQ. Is there anything else you would be keen to share with our readers?

Smith: Please don't use end-to-end tests at build time. If you want to test your code, use unit tests or acceptance tests at build time. If you want to test a third party your code depends upon, use monitoring at runtime.

Additional information on Steve Smith’s latest thought about end-to-end testing can be found on his recent blog article “End-to-End Testing Considered Harmful”.

Topics

Pitfalls of Unified Memory Models in GPUs

Beyond Platform Thinking at RB Global – Build Things No One Expects, in a Place No One Expects It

Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

Using a Product Value Curve to Prioritize Work

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Write for InfoQ

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Microsoft Introduces Drasi: Open-Source System for Real-Time Event Processing and Automation

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

Microsoft Launches Azure Confidential VMs with NVIDIA Tensor Core GPUs for Enhanced Secure Workloads

Challenges and Lessons Porting Code from C to Rust

Copilot Now Available in OneDrive: AI-Powered Features for Streamlined Document Management

Ephemeral IDs: Cloudflare's Latest Tool for Fraud Detection

Beyond Platform Thinking at RB Global – Build Things No One Expects, in a Place No One Expects It

Evolving Trainline Architecture for Scale, Reliability and Productivity

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Using a Product Value Curve to Prioritize Work

Managing High-Performing Software Teams

Adaptive Responses to Resiliently Handle Hard Problems in Software Operations

Stable Diffusion 3.5 Improves Text Rendering, Image Quality, Consistency, and More

AI and ML Tracks at QCon San Francisco 2024 – A Deep Dive into GenAI & Practical Applications

Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

Meta Optimizes Data Center Sustainability with Reinforcement Learning

Google Cloud Adds Scalable Vector Search to Memorystore for Valkey & Redis Cluster

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

QCon San Francisco

QCon London

InfoQ Dev Summit Boston

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?