Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Measuring Value Realization through Testing in Production

Measuring Value Realization through Testing in Production



The panelists discuss what the best patterns for testing in production are and how testing in production can provide feedback that can be built back into the continuous delivery lifecycle of DevOps.


Karishma Irani is product management lead @LaunchDarkly. Sean Davis is DevSecOps advisor @TransUnion. Andreas Prins is director engineering Mendix Data Hub @Mendix. Orit Golowinski is senior product manager of the Release Stage @GitLab. Mike Burrows is founder @Agendashift. Shane Hastie is director of community development @ICAgile.

About the conference

InfoQ Live is a virtual event designed for you, the modern software practitioner. Take part in facilitated sessions with world-class practitioners. Hear from software leaders at our optional InfoQ Roundtables.


Hastie: I'm Shane Hastie. It is my privilege to be the host for the measuring value realization through testing in production roundtable. We have a panel who I'm going to ask to introduce themselves briefly, and answer the question, what does testing in production mean to you?

What Testing In Production Means

Would you introduce yourself and tell us what does testing in production mean?

Golowinski: My name is Orit. I'm the senior product manager for the release stage at GitLab. I'm also an ambassador at the DevOps Institute. Testing in production, for me is the next step for continuous delivery. It takes small increments of our changes in code and lets you distribute it to a controlled audience. What's important here is getting that quick feedback that we want in production to know whether or not you can continue rolling out your feature to a larger audience, or if you have to roll back and disable that feature. Since it's controlled, it's on a small percentage of users that are affected.

Prins: I'm Andreas Prins here from the Netherlands. I'm operating as the Director of Engineering within Mendix. Prior to that, I worked for another CI/CD company next to GitLab and all the others in this space. For me to add to what is already stated for testing in production, is really extending your view, from a product quality focus towards a product impact focus. It's really another view more or less of the same object to learn more about that object and how it's behaving.

Irani: I'm Karishma. Product Management Lead at LaunchDarkly. What does testing in production mean to me? To me it means having the engineer respond with as much confidence about, is this going to work as I feel when I'm about to launch something for customers? It's something that quite honestly, for me, brings the entire team together so they can swarm and feel confident about the experience we're about to deliver to our end users, and be on the same page.

Davis: I'm Sean Davis. I'm a DevSecOps advisor for TransUnion. What it means to me? I could sum it up by saying it's really important because it helps connect our customers to our development and engineering resources by providing those critical feedback loops that ultimately help create more secure, higher quality applications and experiences for our customers. Really delivering that level of excellence to our customers, but also taking a lot of their experiences and translating that back to actionable information that we can give to our engineering and development teams to provide better quality and better security.

Burrows: I'm Mike Burrows, founder of Agendashift, perhaps known also as an author. The second edition of the "Agendashift" book is about to come out. For me, I always like to work backwards from my definition of done, someone's need was met. You can only be really sure that you're actually meeting a need once the thing is actually in people's hands, and looking forward to that moment. Working backwards from that moment really explains a lot of what I do.

Shifting Left

Hastie: Aren't we supposed to be shifting left? Hasn't that been the big mantra recently? Who would like to explore that one?

Prins: The interesting thing of shift left is actually ridiculous thinking, because shifting left is not that there's a beginning, if you think about continuous application development and continuous delivery. What if we move shift left? What if it's a circle, all of a sudden, you're on the other side? I think that's the importance. Because just before the start of the development there is the go live and the live state of that particular feature, and the announcement that we want to do. To me, it's not one or the other. If you think about shift left, definitely, to detect as early as possible, but at the same time if it's a circular something, which development is to me, then moving to the right or expanding to the right is just closing the loop by understanding how the baby behaves in production, and ultimately raise the baby towards an adult. That's important to me. It's to me not from the dictionary.

Burrows: Another way of explaining, pretty much to Andreas' point, a lot of things change when they're continuous. The Iron Triangle is a thing of the past for many of us. A good process is one that asks the right questions at the right time. Our continuous activities are all plugged together in different ways, but each of them needs to ask certain questions. We need to make sure they're asked at the optimum time, and often that means asking that question earlier than we might have done 10, 20, 30 years ago.

Golowinski: I agree. I think that in a continuous process, we're both shifting left and right, because it's an endless loop, and so everything goes up and downstream. I think what's important here is that when we say testing in production, it doesn't mean only testing in production, we need to test also left during every single different phase that we can, as soon as possible. We still need to do that testing in production. It's important to note that when we say shift left, it doesn't mean let's just push everything to production.

How to Enable Push Right and Left at the Same Time

Hastie: How do we enable that, push right and push left at the same time?

Irani: In my opinion, when I talk to customers, they talk about how their managers or their teams are trying to shift left. They don't sound excited when describing it. I'm like, tell me more. It turns out that a few customers or a few companies are trying to create this mindset shift without actually providing the correct resources, the correct processes for teams to be able to correctly continuously test, continuously deliver, and continuously deploy. When we go in specifically, I've seen the value that LaunchDarkly delivers with feature flags that allows customers to basically test a feature on their own branch the way they would in production by actually deploying it without releasing end-to-end users. It's almost impossible to use LaunchDarkly and not shift left. To my point, trying to create that mindset shift without equipping teams with the right resources, with the right processes and expecting them to just shift left is not going to be beneficial.

Burrows: It's hard to test in production if you haven't got a hypothesis to test as well as people that spend. That's quite technical language. I think even more deeply than that, you've got to know what needs you are meeting. I've already mentioned my definition of done, someone's need was met. We actually need more empathy for our end users. We need to understand not just that the user wants something, but the situations in which they're likely to need it, and how we would know that we're satisfying the need in that particular moment, and so on. That involves product and tech people talking to each other. People working on applications versus the people providing infrastructure also talking to each other so that these things are actually measurable and reportable and all these other things. That takes a lot of collaboration and teamwork to actually do this thing properly.

What Could Go Wrong, from the DevSecOps Perspective?

Hastie: I was going to ask Sean, bringing the DevSecOps perspective in here, what can go wrong?

Davis: So much. I think it's awesome that we're really dispelling this, talking about shift left. You look at this, for a lot of organizations, it's almost like a marketing term. Because when we talk about this process, it is very circular for us. We also have to think there are a lot of legacy companies out there that aren't even really fully into DevOps, yet, much less DevSecOps. When we talk about specifically with security, especially with testing security, it's very taboo from your regular pipeline testing that you have. I think we do shift left with a lot of things. Imagine, the test occurs in production, we gather valuable information from that. It has to be moved back into the process to say, what do we learn from this that we can improve from a stage that occurred before we arrived at delivering this value to our customer? When we think about this linearly, more of the legacy, pipelines do work in a circular motion, but it's almost like a set of rings. Then there's larger loops that occur within those rings so it's not even really a circle, it's almost similar to like systems thinking. It's a system of systems, but it's a circle of circles.

Instead of thinking it from shift left or shift right, from a security perspective, we really want to look at what is the life experience of the code that's being developed, so when it very first gets defined, it's being created. Think of it like a child. When the child is conceived it's put into that source repository, and what needs to happen and be tested to make sure it's secure. Then as it goes to the pipeline, how do we make sure that we guide this safely around some guardrails that respect the delivery needs of not only our customer, but our internal customer of our development teams. Testing in production helps us get a lot of that valuable information, as well as the internal testing. Going back to what Orit said was, just because we do it in production, doesn't mean we shouldn't do it before production, too. I agree 100% with that.

When you think about security, just because we're securing the endpoints for a customer doesn't mean that we don't need to look at security with consideration to the developers and the engineers. When we impose certain testing requirements, think about when your development teams went from just writing code to now having to operate in TDD. How tragic and life changing that was for them. They're like, "You mean now, not only do I have to write code, I have to write tests too?" They have to come from somewhere. Looking at that from a security perspective, we think about a lot of these testing processes, and how it impacts every step of the process, not just outside.

Testing in production provides that value to say, here are things we can't replicate in pre-prod. Maybe the load the customer has, the unique geographic needs they have that we can't just test prior to production.

Prins: What I noticed amongst our customers, it's relatively easy. I realize it's hard for a lot of us. It's relatively easy to move shift left because that's a people and a technology area. If you start talking about testing in production, we're talking here often for the larger enterprises, about the process change where we talk about compliancy. Then all of a sudden, definitely testing in production? The process change to move to the other side of the house is so much bigger, and you really need to have people that come from modern tech, or that are a firm believer to push over that hurdle. I think a lot of organizations, so back to your question, Shane, how do we need to implement? You need to be aware that the right side of the house, the moving into production is a different animal and therefore also you need a different approach to solve it, and really build friendships in a different department rather than bonding with your engineering teams.

The Shift Left Mentality and Testing in Production

Davis: Do you think that with the shift left mentality that people have, that the goal or the idea was really, everyone says that every environment before production should be like production? That that's what they meant? If we're going to test in production, can we make sure that every environment before that simulates what that testing in production would feel like?

Prins: No, I don't believe so. What I observe at least is that for the more modern applications that live in the modern architecture, it's relatively easy to build some immutable stuff, and mimic the behavior like it is in production. If you talk about 80% of the world's IT, that is not as easy, and spending millions to get it replicated throughout all environments, that's not possible. I rather spark conversations about what fits where. I think that's a more important conversation to have. What can you improve in the beginning, and start doing additionally, at the end? I think that's a more gradual conversation and approach to get there.

Examples of Situations That Have Gone Well and Situations to Avoid

Hastie: Can we perhaps talk some real-world examples? What are some situations where things have gone well, and some potential, here be dragons, things to avoid?

Burrows: I'm old enough to be able to look back on past careers. I'm going to look back on my time in investment banking. There was one thing I noticed that the best banks did, and I worked for one of the best ones in the '90s, was that they understood that different systems had different rates of change. The architecture really supported that. Front office systems were decoupled from middle and back office systems, they talked to each other asynchronously, and so on. That's what made that possible. That gave front office systems in particular, a lot of flexibility, they could deploy on any day of the week. Perhaps on rare occasions, even intraday, that's not something we like to do. That flexibility is actually super important, because you can't control all the timelines, when there are internal systems, but you're also connected to markets, and they might upgrade on a particular day, and so on. You've got to be ready with your code in place on that day for you to be able to flick the switch on the appropriate day, and to be really confident by the time that trading starts that it is actually going to work. That can involve things like smoke tests, and dummy trades, and all these kinds of things.

Even in a very regulated environment like banking, it can be very legitimate to do certain test activities in production, and even things that actually potentially have a business impact, and not just a technical impact. For things important enough, you will find counterparts that'll be interested in testing with you and so on. I remember, again, going back 20 years, even the hedge funds would be testing which bank was going to be quickest that day. That's real business conducted with real banks but with a technical agenda as well. I think in every industry, in the most surprising corners of the most surprising industries, you can find some very legitimate examples of this.

Hastie: Anyone else want to bring in some examples?

Prins: Yes. A really operational pragmatic example. We're running our product that's hosted on AWS Frankfurt, but for now, a lot of different countries are using the same system that is hosted in Germany. You can imagine, if you will reach to us, Shane, from New Zealand, to that same server, there's a different behavior, or if Sean would reach from the U.S., to the system. What we recently did is really a performance test just to validate, in which regions is it still acceptable and in what regions do we see a lot of customer growth, and hence, we need to start replicating or moving towards multiple areas across the globe? We're using that operational data to make product decisions and say, it's about time to go to that region, or not yet, because there's almost no customer in that particular region reaching out to our systems. Using the data and that understanding, how the child behaves, is important to determine from a product and an infra, where to invest and how to grow.

Irani: From our standpoint, I think things that have gone really well are empowering teams to feel like, yes, you don't have to spend several weeks trying to test this, and then worry that it's going to break in production. LaunchDarkly allows you to deploy when you want and release when you're ready, so teams can continuously deploy without actually turning these features on for end users or having end users even know these things exist, aka dark launches. That's gone really well. The downside of it is when we're occasionally overconfident and push to production and then something breaks, and that's actually created a lot of roadmap requests from us to us, because as you imagine, we dogfood LaunchDarkly heavily. Most of the feature requests actually come from our own engineering teams and DevOps teams. Yes, one of the examples of something that we've organically felt the need for after testing in production is this concept of a kill switch, which is, ok, I've pushed something to production. I'm testing in production, but it's clearly impacting other features that are live for customers in production and impacting the infra for them, and so we've had to kill switch a feature or pull it back. That's something that's gone well, but simultaneously not gone well.

Golowinski: I have another example. It's actually close to Mike's because it was a customer from the banking industry. It's interesting that even in banking, and with the highly regulated environment, people are still testing in production, because there really isn't a good substitute for that. What really works well for this customer, but also for others, is really defining the control group, because testing in production is really all about feedback. Gathering that feedback is important. You need to know what you're gathering and from whom. If you have a control group that they know, and you have to be fully transparent with them, that they are in that testing group, and they can let you know if something is working well or not. In the banking industry, you know that these transactions happened on this control group, so even if something goes wrong, it's pretty easy to monitor and fix. Also, if everything works well, then rolling out either with a feature flag and controlling the audience, or if we're talking about canary, or incremental rollout, they're all really easy at this point. Especially when we're talking about cloud native and being able to really quickly just drain the traffic and go back to a different server. I think we have a lot of advantages in today's technology, even in these highly regulated environments as well.

Burrows: Orit raises something really interesting and important. Actually, that ability can actually lead to things like how do we do cutovers in production other than Big Bang? How can our users slowly adopt certain things? What are the business controls around that in the banking environment? Which trading books, for example, are going to be working with the new functionality, and so on? If you build the controls into the technology, then you've got a lot of control over how the rollout happens, or how a cutover happens, for example, about perhaps a completely new business process, or even a completely new technology. It very rarely needs to be all or nothing. You can find ways of configuring it so that things can be incremental. That's really helpful.

The Architecture Landscape, Edge vs. Core

Prins: A question to you, Mike and Orit, is, what I noticed in my time at ING, is that these type of activities happen at the edges of the landscape. Often the customer facing easy stuff that can be changed to more modern architecture, and limited to the inside, rather that experience is from four or five years ago. Do you guys see a change that is going deeper in the landscape or in the architecture landscape towards the core of the systems, or is it still staying at the edges of the system?

Burrows: I think it always, to an extent, was at the core. Trading systems have to be the glue between the markets on the outside and the settlement systems at the back. Actually, that's where a lot of flexibility actually happens. In the end, each business has its own way of thinking about what business it's doing, and it has to be represented and risk managed in the trading system. You need to be able to evolve at different speeds, and understanding that is really crucial.

Golowinski: I definitely agree with Mike. Especially, when we talk about feature flags, it's just really easy to think about UI and changing of color, or changing of buttons. When you think about these canary releases, it's really in the core of the infrastructure, so it's definitely not only in the user interfacing segments. I've read lots of different papers about these dark launches, or incremental rollout even of databases and changing of schemas. Everything that we can do incrementally, is much better, because we get the feedback really quickly. We can act upon it. We can fix things. Again, feedback is really crucial here.

Burrows: Our schema upgrades went around the world and our system upgrades went around the world and so on. Everything was moving almost all the time, but within a particular area we always knew exactly what we were doing.

Davis: I just wanted to add to Mike's comment a little bit earlier, it also triggered me thinking about, I used to work for a hosting company. One of the things that we found with feature toggling and the kill switch that was really valuable that we used a little non-traditionally, was we had alpha and beta customers for platforms that we were still building, like brand new offerings that weren't on the market, cloud related things. We allowed those alpha and beta users to basically drive these are the things that we want. Like, what's the next package that you want to be published to be available in turnkey? We found if there was somebody trying to abuse it, where someone would come in and say, "I'm going to use this package." Then they would spin 100 of them on a reseller account. We could quickly turn that kill switch off to say, we don't want to allow this feature on this platform. Whereas if that was something that was globally offered, now we're impacting our entire user base, so using that in a controlled manner with a smaller set of canary infrastructure, and some even A/B testing. I know that's controversial. Some people believe strongly in it, other people not so much. It allowed us to rapidly develop what the largest demand for our products was, but be able to do it in a safe way that if we weren't prepared, or it wasn't fully formed, and there was someone out there abusing it from a security perspective, we could hit one button, and those were no longer available.

Using Shadow or Synthetic Traffic to Test in Production

Hastie: Can you speak to the use of shadow or synthetic traffic to test in production? We're strangling a monolith and have many opportunities to test new microservices, service based features, as they replace those in the monolith.

Burrows: I can go back to the example I had before. We're going back 25 years, banking systems talking to each other through messaging middleware, self-describing messages, and all the rest of it. To us, now, we hardly use those terms anymore, because it's just ubiquitous. It's everywhere. JSON and all those kinds of things, internet protocols, all the rest of it make these things crazy easy, but 25 years ago, we're talking proprietary technologies. When you do have things connected by those protocols, then it's quite easy to tap off traffic and feed things from a live system, for example, to a test system. It can be an important way to do functional testing, volume testing, and so on. That architecture is naturally evolvable. Different things can move at different speeds. If you are trying to do something with a monolith, the more that you can make things talk to each other asynchronously, the more flexibility, the more power that you have, and the ability to change the plumbing even potentially while systems are still live. That is very powerful.

Irani: Obviously, while Gary's question is a little more catered towards testing, the migration or breakdown of a monolith to microservices, for me as a product manager, something that's really interesting about testing in production is that it forces me to know or to decide what good is. What is my integration test as a product manager that I'm satisfied with? Testing with real life users will always yield better data for that. I can check whether something I've tested in production, or I am testing in production, is yielding the results that I expect, whether that's through an A/B test or through a small group of users that I'm closely watching. It leads to more purposeful product development, and quite honestly, a more healthy team culture.

Golowinski: First of all, I think that if you're trying to simulate traffic to production, that's not going to be close enough. This is exactly the point why we want to test it in production, because we can find a good substitute for production. A really nice emerging way that we're talking about shadowing production traffic, which is really interesting, is on the side of testing on production but the idea is that you mirror traffic coming into production. You duplicate that traffic, and you send that traffic into a staging environment or a pre-production environment that's similar to production. You can see how that new functionality is behaving with this real life traffic. It's still not the same as testing in production. It's mimicking it. It's almost there but it's not exactly that.

Prins: I do believe Gary also referred to synthetic data. I do think, in particular, synthetic data is excellent to use early on because you can generate whatever you need, you can come up with all the patterns you can possibly imagine, even more patterns than existing in production. These are the perfect ones to take shift left, to take early on in the process. In particular, if you're in a slightly older infrastructure with a little less ability to do all the fancy feature toggling and A/B testing, this is the way to start.

Davis: It really comes down to what is it that you're trying to accomplish by testing in production too. Sometimes we want to test the security of things. We want to test the resiliency. We want to test the customer's experience. We want to test the performance. Each of these have different markers that we're going to leverage, and feature toggling, or A/B testing, blue-green, canary, whatever method that we're using, synthetic transactions, are going to be geared towards something very specific. Synthetic transactions are great if we just want to see, can we handle this load? It doesn't tell us anything about the traffic that's being generated by our customers. If we wanted to test against something, say, real people are going to try to SQL inject my site, I'm not going to get that from a synthetic test any day of the week. Feature toggling is not really going to buy me a whole lot with that, but kill switches will be valuable. It's important when you're looking for testing in production, understand the purpose of what you're trying to do.

I think it's also important to ensure that you know what your blast radius is going to be if you have a problem. Because we could easily overrun error budgets, or we could have other challenges with teams that aren't prepared to handle what our tests have generated. Then we've negatively impacted the culture and what we're trying to accomplish with testing in production, because then people are like, "We're not ready for this, because we broke something." We need to prepare people to say, "Things are going to break. We are going to have challenges with this, but be prepared, and here's how we're going to mitigate that," maybe a smaller percentage of users or a smaller test base.

Burrows: Can I just pick up on that purpose word?

Davis: Absolutely.

Burrows: Shane wanted to get us on to A/B testing. I'm going to say straightaway, I have a very ambivalent attitude towards A/B testing. It's great if you've got the volumes that enables you to do a significant test. Anywhere that someone uses a tool to make a decision for them, maybe that's a substitute for strategy, because they don't have one, or they're not clear about what their purpose is, clear about their values, their design values, their core values, their production values, all those kinds of things. There's a danger with A/B testing that it's going to drive you in the direction of addictive engagement, not in actually meeting people's needs. I think, again, we need to be really clear about what needs we're meeting. How would we know that those needs are being met at the right time? If we look to measure those things, and measure the user experience, as opposed to just measuring very basic measures of user engagement, it's much healthier. Just be very cautious when we're letting a tool make what should probably, be a strategic decision around. That doesn't just speak to testing in production that speaks to things like ROI calculations and other things as well. When you've got a clear purpose, and you've got a strategy to achieve that purpose, then decision making is much easier. The tools aren't a substitute for either of those things.

Culture Shifts That Need To Happen, If We Aren't Moving To More Testing in Production

Hastie: My role at InfoQ is I'm the lead editor for culture and methods. What are the culture shifts that need to happen? I've heard the term culture a couple of times. How do we bring in and what are the culture shifts that need to happen if we aren't doing this, moving towards more testing in production?

Prins: It's the fear of compliancy and all these things, which I believe is a cultural thing almost, to step over and to understand that you can do it actually really controlled or even more controlled than you ever did it before, in these days, with all the tech that is out there. That's one of them. The other one which I still notice where people would still argue, it's not my cup of tea: that's operations, that's infrastructure. No, I'm the developer. I'm involved. I'm putting it really black and white. That's a pattern that we still see in the markets. I think having that conversation about an autonomous unit where you are responsible for the entire lifecycle, then it is super crucial to get to understand how the baby or the child is growing up in production and how it's behaving. Really talking about, what is the team's responsibility? If you're in a slightly bigger organization with still separate units, that is hard to get out. Then I would still recommend and challenge people to go out to your peers in the other departments, and start building that discipline of, can we connect and understand how my code is behaving in production, and what the impact is to the customer?

Then the third one I believe is actually product management. Not to blame anyone, but the understanding that it's not me as a product manager that decides how the roadmap looks like, and push out based on my knowledge and my understanding of the market what needs to happen. Actually opening up for, there's actually users that can help me, or there's the behavior in production that can help me to decide what direction I will take. I do think roles of a product management, utilizing the source and the knowledge that's coming from production is crucial for decision making.

Irani: I couldn't agree more. I think, up until recently, part of the laziness that I've had as a product manager to do that more with my team has been related to just how manual the process is. When I have to decide, we're going to release this to a subset of customers three days from now at 5 a.m. I don't want to wake up at 5 a.m. I don't know about you guys. Similarly then deciding, we're going to run this test, and only if the conversion is above 30%, will we start rolling it out for customers. Also, I need to get approval from my eng lead before I can turn it on in production, cross environment promotion. It's just a lot. If the process is completely manual, I want to trim it down just to save time.

This is a heavy focus for us at LaunchDarkly, where we're investing in what we internally call as feature workflows. The idea is, as a team, you do have a release workflow. You have an idea of when you want to turn it on for your customers. When you want to test it with a small beta group. When do you want to get approval from some admin or team lead before you turn it on in production? You want to coordinate this across teams. This has been the hurdle that a lot of teams need to overcome, for product managers to work closer with the engineering team or with the SRE team to ship something to production. That's exactly one of our heavy focuses.

We have introduced the option for you to schedule things, for you to select at what stage do you want approvals from which team members? All of this makes it easy for, for example, financial organizations that have blackout periods. Engineers can't ship something between the hours of 9:00 and 5:00, otherwise, it's going to break something. They can just schedule this blackout period or maintenance period, and all their requests get queued up, so everything is scheduled. Similarly, if you need approval, you can just collaborate and get peer review before you can merge something in production. You can define all of this. An important thing, I think, is to take the burden off from engineers to have to schedule these things, like work with product managers and have to figure it out, and for product managers to quite honestly nag their engineering teams. I think engineers should focus on the deploying, and product managers can collaborate with them on the releases.

Burrows: Following on from Andreas' one, so adding to the idea of working backwards. What if for every piece of work that we did we knew that we were going to be accountable for sharing what we'd learned about the customer, about the organization, about our products, about our platform and ourselves? It's the way I'm rethinking the service delivery review, as a way of making sure that there's that moment for that learning to be shared and captured. Perhaps to provide more input to the process, making the whole thing a lot more creative and generative, and so on. Just little tweaks like that to your meeting design, organization design, review rhythms, and all those kinds of things can have quite an effect on the cultural things that we were talking about, making sure we get the most from our activities in production.

Golowinski: To top everything that was said here. I totally agree. I think it's also important to understand what testing we're doing in production. For example, if we are doing a canary release, which is more in the infrastructure level, usually the person that's going to be responsible for that rollout is going to be an SRE, or an Ops engineer. Whereas feature flags is much more flexible. You can change it at runtime. You can change the rules as you go. You can collect feedback and change. That is basically anyone on the team that can do it. It gives a lot more responsibility to the people on the team to monitor what's going on in production, and not push that onto the Ops team, and play around with the rules and the segmentation of the audience. I think in that sense, this type of testing in production gives a lot more autonomy and responsibility to individual team members. Whereas when we're talking about infrastructure and segmentation, that's much more difficult on canaries. For example, you could do domains or geolocations, but you can't segment users in such a flexible way, like saying, if you log in to my system using this button, or if you are using this feature, feature flags give you a lot more flexibility to change everything and you don't need to redeploy everything when you change it. It's just super flexible. It's just great because anyone really on the team can participate in it.

Different Approaches to Take When Delivering Platforms to Other Business Units

Hastie: What different approaches would you take, if any, when delivering a platform upon which other business units develop or deliver business outcomes, as opposed to those business features themselves?

Burrows: I think my example from banking was relevant. I'm trying to think a bit more recent example would be from a government digital space, where certainly in the UK, that's been a very exciting place to work, and I feel privileged to have worked in it. Very often, you've got new frontend systems and very nice citizen facing systems, and behind them the older systems. The older systems are often harder to change. You have to find a way of new systems and old systems talking to each other. We've found that the collaboration between teams was really crucial. Then engaging change management folks in that as well. I worked on two of so-called exemplar projects. We had like a joker card we could raise to get stuff out into production, but still, we have to coordinate with other people or the whole thing wasn't going to work. You really do have to pay attention to the wider organization, to relationships between teams, and so on. Actually, some of those relationships were actually quite difficult at times. The effort we put into making them work better, absolutely paid dividends. It got to the point where we had the two teams, actually, or members of both teams working in the same room and an incredible difference in productivity and in the health of that relationship when we achieve that thing.

The Bleeding Edge

Hastie: What's next? What is the bleeding edge?

Davis: Maybe automated production testing, like chaos engineering, driven by AI and ML.

Prins: I would not call it immediately AI. I would argue, the use of data. Where Karishma was talking about, can we use more of the data and automate that process? I rather prefer utilizing the data 10 times more than we do these days. Because you can, to your point, make a workflow that does the stuff, but how do you actually use the data? How did that feature behave last five releases? What was the change that you made, to which piece of the code that caused the issues? Can you do something with it? I would argue, the uses of data without going too fancy with AI with at least some ML, would definitely be there.

Irani: For me, it's something similar, but I almost want to set up predictive smart rules to create sophisticated user experiences. I want to have 5 versions of a single application or experience out to customers simultaneously. Depending on the performance, whether it's conversion, page views, any numeric measure, whether it's API response time, I want to route traffic to the most optimal experience. Then have this be a recursive thing across features so I, as a product manager, feel confident that my end users are seeing the best possible experience to help us achieve our business outcomes.

Burrows: Actually, you remind me of a slightly less technological version of the same thing, perhaps making some of that performance more visible, not just internally, but to our users as well. I've found that that loop closing has actually been a source of real insights as well, because sometimes our internal measures of how things perform is at variance with user perception. Understanding that and doing something about it has been a really important route to better performance, better capacity management, and so on.

Golowinski: I think, also, my dream next phase of testing in production would be enhanced observability. That automation would be built on top of that. Some way for us to understand dependencies, to understand the metrics that we're collecting, the feedback, and automate based on what did we do in the previous scenario that made this work, and just have the systems do that for us.


See more presentations with transcripts


Recorded at:

Sep 05, 2021