When Feature Flags Go Wrong
When Feature Flags Go Wrong
Feature flags, if used correctly, can superpower development, allowing developers, ops, qa, product, marketing and sales to bring better features to market, faster. Feature flags help software teams separate out software releases (code being live on production but not necessarily visible to the users) versus features being live and accessible. A feature flag acts as a gate. At its simplest, a feature can be turned “on” or “off”, independently from a deployment.
However, a feature flag can be more than a boolean. A feature flag can have percentages, segments, and be as complex as needed to help not only with release management (canary releases, dark launches, progressive rollouts) but also with long-term control (access level control rights, different customer one-offs, and behavioral control).
Feature flags can also go very, very wrong if not managed correctly, once leading to a $460 million loss in 30 minutes. Feature flags can be the worst kind of technical debt - if misused or mismanaged. In this article I’ll walk through some horror stories of feature flags gone bad, and lessons learned.
Ambiguous/reused flag names
A flag should have a clear, well understood name. A flag named “user_control” is ripe for misunderstanding. One back end team once thought that a given flag controlled the functionality they were using. However, unknown to them, another front end team had also reused the flag to gate some of their own functionality. The two teams started flipping the flag based on where they wanted it to be. Like two people controlling a light switch, the flag was never in the right state. Eventually they figured it out (after much teeth gnashing) and separated out the flags.
Flags can be used for both short or long-term control. For short term release management, code can be deployed to production, then turned on by the teams that are validating functionality.
- QA - can ensure that the new features work as expected
- Performance - can use percentage rollout to verify that the feature scales in the real world
- Product - can invite early access users to give feedback
- UE - can conduct usability tests
- Marketing - can invite press or analysts to preview new functionality
A long-term flag is used for permanent control of a section of code. A long-term flag might be used for entitlements or for segmenting a user base. For example, a SaaS company might have a flag to allow multiple enterprise customers to get access to different parts of functionality, or for beginning users to get different features than advanced users. An easy way to tell the difference between a short and long-term flag is whether the intent is for eventually 100% of people to get one variation. If so, this is a short-term flag, and plans should be made to remove it once complete. However, a short-term flag is meant to be temporary, it should be removed as soon as it has served its purpose. LinkedIn is legendary for not cleaning up flags. It once accidentally ran a release with all flags flipped to “on”, which made the site unusable, with conflicting and out of date functionality clashing.
A best practice is to use naming conventions for short vs long term flags. For example, preface all short term flags with “Temp” or have a feature flag management system in place where flags are tagged for their correct purpose. Then, ideally, you can see whether a flag is being called by 100% of the users and should be retired. Also, if a flag is being called by no one, it’s also worth following up on why a stale code path is in place. Sometimes the most valuable job an engineer can do is delete old code. There will be less to test and verify in the future if code that has served its purpose is safely retired.
Visibility to technical (and non-technical) users
If you’re using feature flags to control functionality, you’re creating multiple states of your system. Make sure that all the groups (Engineering, QA, Performance, Support, Marketing, Product) know these flags exist. At the most basic, make sure that frontend and backend teams are in sync. If you have multiple microservices, have your flags visible so the different teams know their interfaces.
A technical team once ran all their integration tests against another team, thought they’d passed everything, and released. The latter team was using a feature flag for a migration. When the latter team flipped their flag, the first team’s code broke. Make sure that everyone knows that you’re using flags!
Pulling flags out of config files and into a centralized user interface benefits everyone. The biggest benefit is that changing a flag in a config file often requires a full system release, which could take minutes (if not hours). If flags can be changed independent of a release, the entire system can move quicker. It’s also important that there be one centralized place for all feature flags. The horror story is to have five teams with five config files. Chaos ensues.
If a flag is necessary for customers to get access to features, customer support should know about the flag. They’ll need to be able to troubleshoot and turn flags on and off. This is a huge benefit of feature flags - if a feature is buggy or slow, it’s easy for customer support to turn it off while the feature is being iterated on by engineering.
Flags are for business too
Beyond customer success, flags can allow sales and marketing to engage existing and potential customers with new functionality. The ultimate power of a flag is that non-technical users can potentially control flags. If flags are buried in config files tied to a deployment, their value is also buried as changes aren’t real time and require engineering resources.
As non-technical users don’t necessarily understand JSON or config files, they are still dependent on engineering. It’s slow and tedious for everyone when a non-technical user has to open a ticket and wait for an engineer to update the flag and finally run a release.
Ideally flags should be in a centralized, visible place where they can be controlled independently of code releases. When a big prospect wants access to a feature, the salesperson should be able to turn it on without having to ask engineering. When it’s time for launch, marketing should be able to grant access to favorite reporters.
Flags ultimately allow business users to enable existing functionality, without having to code or bother developers.
Control access to flags
Think about what is the purpose of a flag. Is it for a...
- performance engineer to test load on the system?
- marketing team to switch on functionality at the time of the launch?
- usability group to do user acceptance tests with some early adopters?
Along with great power, comes great responsibility. All three of these flags are very different use cases, and should have distinct groups touching them. After all flags are visible and controllable by different groups in your organization, make sure that rights are granted accordingly. Don’t allow a business user to accidentally start running a load test on a new risky feature, or to have a technical user accidentally grant rights to functionality a customer isn’t entitled to, or will be confused by.
If someone is turning a flag “on” or “off”, make sure that you know who it is! Whether that someone be an actual human or an automated script, if a flag is causing abnormal behavior, you want to know who caused it, so you can tell them to stop it & debug the issue. It’s human nature to make bad decisions while under stress. The brain goes into “fight or flight” mode, flooding the frontal cortex with hormones. This is good if you need to quickly run away from a lion. This is bad if you need to calmly walk through a complicated decision tree. With feature flags, you can ideally turn off a bad feature, and at your leisure, figure out what went wrong. How to lose a half billion dollars before morning coffee So what happens if just three feature flag mistakes happen at once?
Knight Capital once was the largest high-frequency trader on Wall Street, until it lost $465 million in 30 minutes. The full report is in the files of the SEC. In short, they reused a flag (mistake #1) for a new project. The obsolete code was still present in the codebase (mistake #2). Then, the flag was flipped, the obsolete code triggered trades in quick succession on one out of eight servers. As the flag was not easily turned off without a deploy (mistake #3) a follow up panicked release actually deepened the issue. In a state of panic, Knight’s updated release turned on the obsolete flag on all eight servers. All in all, $460 million (almost a half billion dollars!) was gone, all before 10 AM. Knight Capital could have prevented this situation by not reusing a flag, cleaning up their code, OR, in what should have been the worst case scenario, by having the flag be turned off separately from a deploy.
Ultimately you’re responsible for producing quality software predictably, quickly and in a repeatable fashion. Feature flags can help you and your team deliver more features faster, and with less risk. Make sure that you’re treating your feature flags as they deserve, not as an afterthought. Make sure they’re manageable, visible, actionable and trackable. Feature flags can be an integral part of modern software development, so give their management first class support.
About the Author
Edith Harbaugh has more than 10 years of experience in product, engineering and marketing with both consumer and enterprise companies. She holds two patents in deployment. Edith has given talks on feature flag management at Microsoft Build, NDC Sydney, GlueCon, and DevOps West. She co-hosts the “To Be Continuous” Podcast with Paul Biggar, CircleCI founder. Edith earned a BS, Engineering from Harvey Mudd College. She enjoys trail running distances up to 100 miles.
Great article! Here's a few more tips.
Instead of short-term vs long-term flags, I'd go further and suggest two different systems. Long term flags have very different testing requirements, and should be in a different place in the UI. They often have very specific purposes and shouldn't have all of the options available to feature flags. You'll need to be very careful with testing all future feature development against both branches of the flag, possibly many combinations of multiple flags. Conversely, short-term feature flags are scaffolding and should be removed as soon as the feature has been shipped and proven in production. Obviously having two redundant systems has its own cost, so this may be a pattern reserved for larger teams and more complex codebases.
For actual feature flags, you should have a production alert that pages if the flag has been in the codebase for longer than X days. At the least this forces a very valuable conversation, and often it finds dead or broken code that could cause serious issues down the road.
Ideally toggling a flag goes through the same deploy process as code changes (automatic gradual rollout with automatic rollback in the event of problems). Perhaps even switching the default flag state in your automated tests to ensure compatibility with the entire codebase. This may be as simple as storing flag state in a config file in source control, or as complicated as a small web app that toggles state and triggers a deploy. Toggling feature flags are often significantly larger changes with more risk than code deploys, especially if you're practicing Continuous Deployment and deploying early and often.