BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Are Canary Releases an Alternative to Testers?

Are Canary Releases an Alternative to Testers?

Bookmarks

Key Takeaways

  • A software Release is always associated with Risk and benefits. Business focuses on reducing risks and cashing in on benefits.
  • A canary release is the same as a regular release as software is being pushed to production, even if it's to a small subset of users.
  • A canary release doesn't guarantee detection of all issues with a small percentage of users using it as users won't be testing the app the way testers do, or may not be using all the features at a time.
  • In certain domains, there is a risk of reputational damage, regulatory violations and lawsuits if a canary release were to impact the user.
  • Given the risk associated, having a canary release capability doesn’t rule out the requirement of an exploratory tester. 

Canary releases got their name from the use of canaries—tiny birds—in coal mines to detect toxic gases before they hurt humans. Workers would carry a canary bird in a birdcage as they walked into a mine and worked inside it. If they noticed that the bird had stopped chirping, they would realize that there was probably toxic gas and would flee to safety. When we apply the same tactic to software development, the users are akin to birds here. We release the product to a small number of users who detect bugs, issues, possible data loss, discrepancies (a.k.a the toxic gas!), or otherwise report dissatisfaction before the release hurts the whole customer base. 

In the present era, we can roll out features to just certain users. For example, the canary testing customers could have opted in for a Beta testing program or they could have been selected by setting some flags to allow only certain types of users to see the new version. A Product Owner decides to make a canary release to assess how customers respond to the new features of the software. They make the canary release available to a small number of users first and then, depending on the response, the release is made available to a gradually expanding group of users. For example, the build could first be released to internal users, then to some specific profiles, and so on.

The benefits of Canary Releases

A canary release helps us determine whether or not a certain feature or approach appeals to customers. 

A well-designed canary test suite can help discover what users think about the new feature. Some questions that we expect responses to include “Is it worth rolling out to the larger audience?” and “Are users even noticing the new feature?”

The good part about canary releases is that there is always an option to roll back if we see problems in the live environment without having impacted the entire customer base. In the mining example, miners would carry caged canary birds in the mine and if a bird became ill or died, the mine workers would evacuate immediately. In software terms, this is a rollback!

Canary Releases can’t replace testing

There is a growing trend of reducing internal testing once canary releases are adopted. Here are some example justifications that arise: 

  • ”We can always roll back if we see problems.” 
  • ”Developers have mocks in place.”
  • ”Developers have checked all the request-response field mappings, and redirect URLs.”
  • ”It’s a simple app and doesn’t affect the user’s money or health—so testing directly in the live environment is fine.”

There is a difference between what a canary release is vs. what people interpret it to be. As with many terms and concepts, the understanding of a canary release varies across organizations and teams. It has naturally followed that teams and initiatives end up interpreting canary releases to meet their pressing needs. In some cases, teams and initiatives also design processes based on what they can manage socially, politically, and in terms of capability and budget.

A tester would discover issues that users wouldn’t know to test for, and which could cause them to walk away from the app if they encounter those issues even once. As anyone from marketing can confirm, customer acquisition is expensive, but retaining a dissatisfied user is even more so. 

There is always the risk of releasing something that damages user data due to zero testing. While we can always roll back a canary release, we’d still need to deal with damaged data or transactions that can’t be rolled back.

Developers also depend on mocks as a replacement for external libraries or others’ code when they write code and automated tests. Mocks are code-level substitutes for other components. Stubs are like mocks, but they represent external software systems. However, mocks and stubs don’t always bring out all the risks around the acceptance criteria, so one can’t depend on them to reduce the amount of manual testing.

Thanks to an increased understanding and adoption of automated tests, we now see reduced verification times. This enables a team to understand very rapidly whether or not their changes have impacted existing features. Over a period of time, the reliability of the verification suite helps us gain confidence in designing canary releases for canary testing.

Speedy quality verification of code under development will depend upon the thoroughness of the use cases verified via an automated verification suite. Such a suite is only as good as the scenarios that it caters to. Achieving automated verification maturity takes time, and is often thoroughly achieved only when it’s created and maintained from the time the codebase starts and requires continuous diligence.

With Exploratory Testing, we ensure that there are lower chances of bad or negative user experiences while we subject them to a canary release. To verify if new features are working well, one must test before a release to ensure that features work as intended. Exploratory Tests needn’t be time consuming, nor do they need to be a repetition of Automated Verification.

The automation suite covers only the known. With the help of Exploratory testing, we deal with the unknowns which can later be covered in the automation. Exploratory testers use their knowledge and experience to predict when the system may behave unexpectedly, and it’s a continuous process. We receive quick feedback. Exploratory testing gives insights into the user’s perspectives, e.g., the fact that the UI of a product is attractive and at the same time easy to use or navigate for users may not be understood by a computer. It’s a foundation for more advanced automated tests. With Exploratory Testing, we ensure that there are lower chances of bad or negative user experiences while we subject them to a canary release. 

Testing before a canary release is as important as testing before any other release. A team sometimes needs to provide a special “rollback” build that would provide support for the incompatible canary-specific changes, or provide database scripts that would merge the content of the changed schema back into the original schema. Regardless of whether or not the release was a canary release, a rollback needs verification to ensure data integrity. Testers can help review the change and the undoing of the change.

It’s also important to note that the appeal or lack of appeal of a new feature doesn’t indicate the impact of the change on the system. User acceptance or rejection could result from the effectiveness of marketing, or how the features are made visible. For example, a workflow doesn’t reveal whether any data is getting corrupted. Even if users acknowledge a new feature, we are unsure if they have explored it in all possible ways to detect any flaws—which is the major expectation from a canary release. 

Canary Releases have some inherent risks

Upgrades:

Canary releases, by definition, are rolled out to a subset of users of the app. Their app versions would need to upgrade to get the release. However, there could be a scenario where the selected users may have not upgraded the app to a new version. Either they may not have opted for background upgrades or may be connected on a mobile network while having opted for upgrades to happen only over wifi, or may have other reasons/constraints (such as enterprise security controls) disallowing them from upgrading the app.

Assuming such scenarios can be a part of test results, product owners may assume that such users have upgraded to the canary release but they wouldn’t have, and the canary test results would be wrong.

The only way to ensure every user in the canary subset upgrades is by initiating a forced upgrade. When it comes to forced upgrades, companies don’t use such a feature frequently. Forced upgrades annoy the users. Companies, therefore, prefer to reserve forced upgrades for critical breaking changes or API version changes.

The Domain:

Depending on the domain of the app (health, bank, trade, finance), canary releases could be risky. If we are dealing with a customer’s money or health via the app, it’s risky to go canary without thorough testing. The possible data loss, data corruption, non-display of any kind of bank amount, or health statistic or wrong display of the same can be a major issue for the user even if the app isn’t crashing and is working. The user has already undergone a bad experience. By the time we roll back upon seeing the canary suffer, a few miners could have already been severely impacted by the toxic gas! 

A canary release doesn’t guarantee the detection of all issues with a small percentage of customers using it. This is because users won’t necessarily be testing the app the way testers do, or may not be using all the features at a time within the canary testing window. 

Feature availability per profile:

Usually, all features of the app aren’t available for all users given their profiles, location, the time they have to go through the app, plus the current dates. For example, in an energy utility scenario, the group of selected canary build users may not have any upcoming bills to review and pay, or they may have scheduled payments for a future month outside of the canary testing cycle, or they may not be present in the right month to give their meter readings. In such cases, the user may not use all the features that have an impact given the new changes. Some call-to-action buttons may not be enabled for a few users, while some notifications may not pop up. This implies that even if there are flaws in the release, the user may not notice them and the risks may get released to a bigger audience. On the other hand, a skilled exploratory tester would ensure all the scenarios and changes—both existing and new—are reviewed and validated by creating suitable test data or by service virtualization.

Roll back:

If a canary release needs to be rolled back, say because of data corruption, there is a risk there as well. A rollback of a canary build won’t automatically ensure the corrupted data gets fixed. Suppose the canary had an issue and users tried to open docs/pdfs for bills, payment, recent transactions, activities, etc., while they were testing the canary build. The users received error messages after trying to open the pdfs. They reported the issue, we acknowledged it and decided to roll back. However, the pdfs which were opened earlier as part of a canary release, which had the error message, are corrupted even after a rollback. Users would still see the error message on opening those particular pdfs. 

Present-day automated deployments can help replace a canary build with a previous stable version but one also needs to consider the impact on the data and other artifacts. Often, a skilled tester works with architects, business folks, and developers to work out an intermediary build because it may not be possible to roll back to the earlier trusted build.

Misinterpreting Canary test results can be disastrous

Product owners often want to give a go-ahead to roll out the release to more users after looking at all successful canary test results. Analyzing the test results is very important but not understanding the implications or misinterpreting the data/test results can lead to serious issues.

A team needs to incorporate certain due diligence and safeguards by way of engineering when preparing for a canary release.

Mocks don’t always bring out all the risks around Acceptance criteria. That’s not what they are intended for.

To facilitate high-speed testing, technical teams have started to leverage mocks and stubs. These certainly speed up the verification of workflows. However, one needs to consider what mocks and stubs provide us with so that we don’t expect undue support from them.

Mocks give us predefined output. They behave only as much as they have been designed for. Mocks don’t run as a whole end-to-end scenario which can give the confidence we want, the way automated tests do. Instead, mocks merely support automated tests with predefined responses. 

Due to their technology and intent, one can’t expect or demand that mocks behave as users or skilled exploratory testers would. For example, mocks don’t serve the interruption scenarios the user might go through while using the app. There are also external influences such as biometrics, the scan of credit/debit cards, of redesigned QR code posters under various lighting, distance and print-quality conditions, meter reading scans, the beta version of the mobile OS, switching between the multiple apps on the devices, offline availability of the app, etc. These factors alone contribute to uncertainty and variability in scenarios/user journeys that one needs to explore and review.

Testing before a release is still a test. Such a test can add significant quality to the product.

There is a growing trend to have real-time users test the app/product. We see teams citing reasons like—“We don’t have real-time data or conditions that users may have, too many backend systems are involved and it’s difficult to create those kinds of scenarios,” etc.

Well, these aren’t at all valid reasons to disappoint the customer when they are using the app. Acceptance criteria are for the team to confirm, and not the users. All the acceptance criteria may not even be applicable for one user. Our app may display differently given the location or type of the payment card or type of usage they have chosen for the default display. The variations can be many and the small selected user base may not get to test all these variations either. We shouldn’t expect end-users to review and confirm the “Acceptance criteria” for us.

Testing ensures that there is no bad or negative user experience while we do a canary release.

A canary build user may reject an app because of a change that the wider audience may not ever encounter. This is a significant but rarely considered problem that catches clients unaware. It can rarely be undone. Public explanations may only make bad impressions worse. Consider a scenario where we have added an extra model in the middle of the screen and hence moved a button/link to the bottom of the app to submit/login or to take any further action. If a canary user is using a small mobile screen with zoom enabled, the scrolling won’t work well. The user will see the button as half-cut and non-clickable. They would be unable to move ahead. The resulting frustration is a bad review or bad ratings on the stores. A larger section of the user base (canary users or everyone) may be able to access the button perfectly fine and they may not even encounter half-cut or non-enabled button issues at all. However, this same group of users does look at the store’s ratings and reviews

To verify if new features are working well, one must test (review by actually using) before a release to ensure that features work as intended.

Depending upon the User Experience improvement, users may intuitively proceed through the workflow and may not notice improvements—the best improvements are that way! Certain improvements such as reduced steps for a workflow are easily evident. Users definitely like apps that they can use to get their work done quickly and efficiently. However, certain fixes are so significant that users may intuitively proceed through the workflow. One might need a build with specific telemetry to understand just how bad a workflow is sometimes, and only then know what to measure with an improved workflow. In the absence of such telemetry, it can be difficult to quantify user satisfaction before and after an improvement. This is a serious topic that has led to entire careers involving psychology, technology, usage patterns, special equipment to observe and analyze user behavior and so on. A skilled tester can often advise a team about pain points and improvements well before user complaints arrive. Likewise, a skilled tester can advise on the specific workflow points where telemetry should be added before and after a user-experience improvement.

Achieving reliable software releases

If your app doesn’t risk the health or money of users, and you don’t care about the ratings in app stores or website reviews, you can consider releasing a canary build without testing thoroughly.

Canary releases don’t imply that testing can cease before a release. For example, a coal mine can’t be constructed haphazardly just because they have a canary to alert them about fatal gas pockets. Even if there are no such dangers, the miners (and the canary) can still die because weak support structures cause mine walls to collapse.

As we have seen, canary releases require quite some engineering, internal review, and testing to ensure that data doesn’t get corrupted and that new data can be read by an interim upgrade instead of a full rollback. Canary testing is therefore more than just the procurement and rollout of a canary testing tool and of deployment automation.

Skilled Testers can help ensure that canary releases are cost-effective and Product Owners can concentrate on meaningful feedback. A skilled tester can help ensure that users are only delighted by new features, rather than anxious upon discovering data loss and degraded experiences. For automated verification of functionality, it helps if developers create, understand and contribute to end to end tests. Developers who do so start to gain a broader understanding of the domain, going beyond just the specific work that they have done.

Once users are exposed to reliably functioning canary releases, they may even ask to opt-in for early previews, provide relevant feedback, and engage with Experience Designers more enthusiastically. Businesses can then focus on and ensure expenditure on revenue-leading and customer-pleasing features.

One would have to spend a lot of money on public relations and rebranding to undo the poor image caused by a bad canary release and comments at the App stores and marketplaces. A company certainly doesn’t want the marketing, sales, and public relations teams to second-guess every enhancement.

Product stakeholders who understand that skilled testers can check for, and review for a certain level of quality will realize the necessity of a Tester—even for a Canary Release.

Canary releases are addictive but introduce their own risk. It’s important to conduct exploratory tests to manage the risks with your fail-fast approach as rolling back may not address the lost users and revenue. Keep calm and test before a canary release. Miners know that even if the canary itself is alive, someone still needs to ensure that the walls don’t collapse.

About the Author

Vaishali Desarda is interested in Quality. She has 10+ years of experience testing and prescribing test strategies for mobile apps, web apps, and API in the Fintech, Open Banking, and Utilities sectors. She works with clients to help them tailor their apps to meet their customers’ needs by understanding the domain, identifying manageable risks, and applying common sense and functional knowledge. She freely shares her advice on practical matters for testers at www.vaishalidesarda.com. 


 

Rate this Article

Adoption
Style

BT