Transcript
Byars: I want to talk about the journey of evolving an API without versioning. I like to start this with a little bit of a story. Just for background context, when I say API, a lot of folks immediately jump to something like a REST or a GraphQL API, and those certainly fit the criteria. So do Java APIs that operate on process, or event-based APIs, they still have some agreement that needs to be made between their consumers and the producers. A lot of the talk is dedicated to the complexity of that agreement. This is based on a real story, but it has been changed, because the story itself isn't public.
The context, we want to book a cruise, ourselves and a partner for a vacation. This sequence diagram is an overly simplified view of what happens from a systems standpoint, when you book a cruise. It might be that we have a reservation system. It's basically a two-phase commit. You hold the room. You pre-reserve the room until payment has been collected through a payment gateway. Then the reservation itself is confirmed in the underlying reservation system. When this works well, everybody is happy. What happens in this real scenario that I've abstracted is there was a significant spike in load. What that spike in load did is it forced the reservation system to send back an unexpected error on that last step, at confirming the reservation. An error that the team that developed the booking service had never seen before. Normally, you run into unexpected errors in systems, you get some unpredictable behavior. In this case, the unpredictable behavior was fairly catastrophic for the organization. Because what they had done is they'd built a retry loop around unexpected errors in the booking service for the entire workflow.
Under load, at peak volume, I might try to book a service. I'd pre-reserve the room. My credit card would be charged, getting their own confirmation. Retry, pre-reserve another room, charge my credit card, get in there, preauthorize or pre-reserve another room, charge my credit card again, and so on. What happens is that that loop continued until either a customer had pre-reserved every room on the ship, or they had maxed out their credit card, and the payment gateway itself returned an error, or there was some fraud-based error alert from the payment gateway. That was obviously a big PR disaster for the organization. It caused a lot of consternation. It was very CNN visible headline. All based on the fact that the agreement between the API producer of the reservation system and the API consumer of the booking service did not completely cover the surface area of responses available.
Background
My name is Brandon Byars. I am head of technology for Thoughtworks North America. This talk is based on a significant amount of experience that I've had in API development throughout my career. I've written an open source tool that will be the baseline of this talk following the API bit over nearly a decade, called mountebank. A few articles on martinfowler.com. This one is based on one that I haven't yet published. It's actually long been in my queue to finish, and this talk is a little bit of a forcing factor for me to do that. This is all based on real world experience. I have led a number of platform engagements. We consider these API platforms a really good way of skilling development inside organizations. One of those articles, the enterprise integration using REST, is quite dated, maybe 10 years old at this point.
This adaptation of Jamie Zawinski's quote here on regular expressions is something that I wrote in that article. "Some people when confronted with a problem think, I know I'll use versioning. Now they have 2.1.0 problems." Versioning is oftentimes seen as the standard de facto approach to evolving an API in a way that makes sure that the agreement on backwards incompatible changes, on breaking changes is made explicit, forcing the consumers to upgrade for the new functionality, but in a very managed way. That's a very architecturally sound strategy. There's a reason that's used so widely. You see here on the left, an adaptation of the old famous Facebook slogan of moving fast and breaking things. As you see from the quote that I put on the right, I like to challenge the idea of versioning as the default strategy, because I think it does cause a lot of downstream implications. The fact that all consumers do have to upgrade is itself a point of inconvenience for many of those consumers. This talk is really dedicated to exploring alternative strategies that produce more or less the same results, but with different tradeoffs for the consumers and different tradeoffs for the producer as well.
Contract Specifications as Promises
When we talk about APIs, again, REST APIs, GraphQL, Java APIs, it doesn't matter, events, we have something like a contract, the specification. Of course, in the REST world, OpenAPI tends to be the 800-pound gorilla. There are of course alternatives, but this is a pretty widely used one. It's easy to fall into the trap as technologists to think of that specification as a guarantee. I really like the word promise. Mark Burgess came up with this promise theory. He was big on the configuration management world, CFEngine, and Puppet, and Chef, and so forth that led to infrastructure as code techniques that we use today. He has a mathematical basis for his promise theory in the infrastructure configuration management world. For more lay audiences, he wrote a book on promise theory, and this quote came out of it. "The word promise does not have the arrogance or hubris of a guarantee, and that's a good thing." Promises fundamentally are expressions, communication patterns that demonstrate an intent to do something, but promises can be broken. As we saw in the reservation system example, promises can sometimes be broken in unexpected ways that lead to cascading failures.
Following the Evolution of a Complex API
I'd like to explore that idea of making best effort attempts to solve customer's needs through some storytelling. I mentioned this open source product that I've managed for nine years now called mountebank. It's a service virtualization. If you are familiar with mocks and stubs that you might use to test your Java code, for example, service virtualization is a very similar construct. It just exists out-of-process instead of in-process. If your runtime service depends on another service, if you're put in the booking service, you depend on the reservation service, and you want to have black box tests, out-of-process tests against your booking service. In a deterministic way, where you're not relying on certain tests to be the test data to be set up in the reservation systems, you can virtualize the reservation system. Mountebank allows you to do that. It opens up new sockets that will listen, and that will listen for certain requests that match certain criteria, and respond in a way that you the test designer set up. It's a very deterministic way of managing your test data.
There's more to it than this picture on the bottom. In the book that I wrote, I had to draw a number of diagrams that described how mountebank worked. This one covers more or less the back part of the process, generating the response. Mountebank gets a call. It's a virtual service that needs to respond in a specific way, returning the test data relevant to your scenario. What it does is it grabs a response. There's multiple types of ways of generating a response, we'll look at a couple. Then the bulk of the storytelling is going to be around this behaviors box. Behaviors are post-processing transformations on those response. We'll look at some examples because there has been a significant evolution of the API in sometimes backwards incompatible ways in that space, all done without versioning. Then the core construct of mountebank is virtual servers. Mountebank just holds an imposter, but it's the same idea as the virtual stub.
Evaluating Options from a Consumer's Perspective
As we look at some of the different options, where versioning wins hands down is implementation complexity. When you version an API, you can simply delete all of the code that was there to support a previous version. You can manage the codebase in a more effective way if you are the API producer. I'm not going to look at implementation complexity as a decision criteria because I've already experienced that versioning wins on that front. Instead, as I look through a number of alternatives to versioning, I'm going to look at them from the consumer's perspective. These three criteria are the ones I'm going to focus on. When I say obviousness, think the principle of least surprise. Does it do what you expect it to in a fairly predictable way? Does it match your intuitive sense of how the API should work? Elegance is another proxy for usability. When I think elegance, is it easy to understand? Does it use the terms and the filters in a consistent way? Is the language comprehensible? Does it have a relatively narrow surface area because it's targeted to solve a cohesive set of problems? Or does it have a very broad surface area and therefore hinder the ramp-up to comprehension, because it's trying to solve a number of different problems in an infinitely configurable way? Then stability is, how often do I as the API consumer have to change to adapt to the evolution of the API?
Evolution Patterns - Change by Addition
A few patterns all in real-world patterns that came out of my experience, maintaining mountebank. This snippet of JSON is an example of how you might configure a response from the virtual service. This is HTTP. Mountebank supports protocols outside of HTTP. This is, I think, a pretty good one. All this is doing is saying how we're going to return 500 with that text that you see in the body. You can also set up things like headers, for example. One of the first requests for a feature extension after releasing mountebank was, somebody wanted to add latency to the response. They wanted to wait half a second or three seconds before mountebank responded. The easiest thing in the world would have just been to add that quite directly, some latency to the JSON, which is pretty close to what I did. I added this behaviors element with a little bit of a clumsy underscore, because I was trying to differentiate the behaviors from the types of responses, represents generation of a canned response, like you see here. There's two others. There's ways of record and replay, and it's called proxy. There's ways of programmatic configuration of response, it's called inject. Since those are not underscore prefixed, I thought I would do the underscore on the behaviors.
More importantly, I thought that having a separate object, even though I only had one use case for it right now on this latency, was a point of extension. I think that's just a foundational topic to bring you up. We talk a lot about backwards compatibility. There is a little bit of forward thinking that allows us to cover up at least some forward compatibility concerns. If we can do something as simple as ensure, for example, that our API doesn't respond with a raw array. Because as soon as you need to add paging information, and you have to add an object wrapper, you've made a breaking change. Adding an object for extensibility is a pretty popular forwards compatibility pattern. This is an example, even though I wasn't quite sure what I would use it for when I wrote this. This works pretty well. This was just your simple addition to an API. This is Postel's Law, where you should be able to evolve an API in a way that doesn't change or remove elements and only adds to them. When I think about how that fits against the rubric that I mentioned earlier, I think this is as good as it gets. We should always feel comfortable as API producers, adding new elements, being a little bit thoughtful about how to do that in a forwards compatibility way. This covers obviousness, elegance, and stability quite well.
Evolution Patterns - Multi-typing
That worked great. Then somebody said, I want the latency to be configurable. I mentioned that mountebank has this inject response type, which lets you programmatically configure a response. I thought maybe I would take advantage of that same functionality to let you programmatically configure the latency. What I did is I kept the wait behavior, but I just had it accept either a number or a string that represents a JavaScript function. I call that multi-typing. It worked well enough. It allowed me to fit within the same intention of adding latency, which are two different strategies of how to resolve that latency through a number of milliseconds or a JavaScript function. It's not as obvious. It's not as elegant. I have not done this since that initial attempt. If I were to run into the same problem today, I'd probably add a separate behavior, something like wait dynamic. I think that's a little bit less elegant because of experience the surface area, because you have to understand the API, but it's a bit more obvious. I think obviousness and making sure that it makes it easy, for example, to build a client SDK, that doesn't have to have some weird translation. Because you need different maybe subclasses, or functions, or properties to describe the API in a way that gets translated to how the API works, because it's polymorphic in sometimes unhelpful ways. It works. I wouldn't recommend it. It certainly involves not having to release a new version to fix the API itself.
Evolution Patterns - Upcasting
This third pattern is really my favorite. It's upcasting. It's a pretty common pattern. You see a lot in the event driven world, for example, but it really works for a number of different kinds of APIs. A subsequent behavior that was added to the list was this one around shellTransform. The idea was, mountebank has created this response, this status code, this body, but sometimes I want to post-process that JSON, to change it to add some dynamic information. I want to be able to use a shell program because I don't want to parse in a JavaScript function. I want maybe to use Ruby, in this example, to do something dynamic. It was relatively easy to build that. Then what people asked for was, actually, I want a pipeline of shell programs. I want to have very small targeted shell programs that did one thing and be able to compose multiple of them to generate the post processed response. What I had to do was change shellTransform, originally a string into an array. It would execute each of those shell programs in order in the array. This one, assuming that both the string and the array can be parsed, is a little bit less obvious, because it does have some components of that multi-typing that we just looked at. It's actually managed in a much more productive way. I think this is a very elegant and very stable approach. I think this is one of the first approaches that I generally reach for when I try to evolve an API without breaking their consumers. Let me show you how it works.
First of all, just to acknowledge, this is a breaking change. We changed the API from a string to an array. The new contract, the new specification of the API lists only the array. It does not advertise that it accepts a string. I could have simply released a new version, changed the contract to the array, and asked any consumers who had the string version to update themselves. That would have been at their inconvenience. The upcasting allows me a single place in the code that all API calls go through. I have this compatibility module, and I follow the upcast function on it, parsing in the JSON that the consumer is sending in the request. You can see the implementation of that upcast function, or at least a portion of it down below. I have this upcastShellTransformToArray, and there's a little bit of noise in there. It's basically just looking for the right spot in the JSON and then seeing if it is a string. If it is, it's wrapping the string with an array so it's an array of one string. It is managing the transformation that the consumers would have had to do in the producer side. It's adding a little bit of implementation complexity, although quite manageable, because it's all managed in one spot in the code, at the core of the tradeoff of not having to inconvenience any consumers.
Another reason I really like the upcasting pattern is that it works a bit like Russian dolls, you can nest them inside of each other. This is another example over time, the behaviors, these post-processing transformations of the response, added a bit more functionality. You see several here, wait, we mentioned that adds 500 milliseconds. ShellTransform, now a list of shell programs that can operate on the JSON and the response. Lookup also has a list. Copy has a list. Decorate is just a string transformation that you can run. Then it has this repeat directive that allows you to return the same response to the same request multiple times in a row. Normally, it works like a circular buffer, it rotates through a series of responses, but you can ask it to hold back for three times on the same response before cycling to the next one.
I wanted to do in a much more composable way because it allows the consumer to specify the exact order of each transformation, which isn't possible on the left. On the left, there's an implicit order encoded inside mountebank, not published, not advertised. While some transformations operate one time, at most, like decorate or wait, some can operate multiple times, like shellTransform and lookup. Repeat, it turns out doesn't really belong there, because it's less of a transformation on the response and more a directive on how to return responses when there's a list of them from mountebank standpoint. What I wanted to do was have a list where every single element in the list is a single transformation, and you can repeat the transformations as much as you want. If you want to repeat the wait transformation multiple times, that's on you, you can do it. It's very consistent. This actually allowed me to make the API, in my opinion, more elegant, and more obvious, because it works more like consumers would expect it to work rather than just demonstrating the accidental evolution of the API over the years. I rank this one quite high, just like testing in general, but like all non-versioning approaches, it does require a little bit of implementation complexity.
The good news is that the implementation complexity for nested upcasting is trivial. All I have to do, I have the exact same hook in the pathway of requests coming in and being interpreted by mountebank, you can call this compatibility module, and all I have to do is add another function for the additional transformations after the previous one. As long as I execute them in order, everything works exactly as it should. We did the upcastShellTransformToArray, so took the string, made an array. The next instance, all I have to do is make the other transformation. If you have a very old consumer that only has the original contract, it'll upcast it to the next internal version of that contract. Then the upcastBehaviorsToArray, we'll update it to the published contract as it exists today at mountebank. The implementation was pretty trivial. It was just looking for the JSON elements in the right spot and making sure that if there was an array, it would unpack each element of the array in order. If it was a string, it would keep it as is but it'll make sure that every single element in the behaviors array had a single transformation associated to it.
Evolution Patterns - Downcasting
The next instance of a breaking change managed without a version was far more complex. This one is going to take a little bit of a leap of faith to understand. I don't want to deep dive into how to use mountebank, or the mountebank internal mechanics too much. This one does require a little bit more context. I mentioned that mountebank allows you, as we've already seen, to represent a canned response that it'll return. For HTTP, we had the 500 status code in the body text. An alternative is this way of programmatically generating a JSON response, instead of, is, you parse in inject in a JavaScript function as a string, as you see here. The function at first just had the original request that the system under test made to mountebank as a virtual service. There is a way of keeping state so that if you were to programmatically generate the response, and maybe you wanted to add how many times you've done that, you could keep a counter, and you could attach the result to that counter as part of the response that you generated, and a logger. That was the original definition of the JavaScript function. You could parse it. Pretty early on, people wanted to be able to generate the response in an asynchronous way. Maybe they wanted to look something up from the database or have a network hop, so I had to add this callback. Then, a little bit later after that, it turns out that the way I'd implemented state was too narrowly scoped. Somebody made a very good pull request to add a much better way of managing state. It was certainly inelegant because I had these two state variables and the JavaScript function. While I tried to do my best in the documentation to explain it, that certainly did not aid comprehension for a newcomer to the tool, that required having to follow along the accidental evolution of the tool.
Anybody who's done a lot of refactoring in dynamic languages, languages, in general, know that one of the most effective ways to simplify that type of interface is to use this idea of a parameter object. As you have parameters start to explode, you can replace it with a single object that represents the totality of the parameters. Then, of course, that makes a very easy extension point, because if I need to add a sixth parameter down the line, it's just a property on that config object. This is the new published interface for mountebank. Again, a breaking change, because for people who passed on that JavaScript function on the left, they now have to be transformed into that JavaScript function on the right. However, assuming mountebank can do that transformation for you, through this technique called downcasting, it's a pretty elegant way of managing the complexity in a producer, instead of passing it on to the consumers. It's not quite as obvious because there is a little bit of magic that happens underneath the hood. It's not quite as elegant, because you do have this legacy of these old parameters that somehow have to be passed around. If done well, it can be very stable.
Here is what it looked like, in this instance in mountebank. What we basically did was we had the new parameter object, this config parsed in, and we continue to parse the subsequent parameters, even though we don't advertise them, we don't call them out explicitly on the contract. You can't go to the mountebank documentation today, and see that these parameters are being parsed in. The only reason they are is for consumers who have never updated to the publish contract using the old contract. Those older parameters will still be parsed in. That solves everything beyond the first parameter, the parameter object. It doesn't solve what happens with the parameter object itself, because that still needs to look like the old request that used to be parsed in. That's why we call this downcastInjectionConfig call down here. That takes us back to the compatibility module. All of my transformations that manage breaking changes in the contract, I can centralize in this compatibility module. I can go to one place and see the history of breaking changes through the API. When I say breaking changes, they are breaking changes to the publish contract, but mountebank will manage the transformation from old to new for you. The consumer doesn't have to.
In this case, what I had to do to make that config parameter object that had to have state, had to have the logger, had to have the done callback on there so that people using the new interface, it would work as expected. For people using the old interface, it had to look like the old request. That's what this bolded code down below is doing. There's a little bit of internal mechanics that I mentioned. Mountebank has multiple protocols, there's method and data, or ways of sensing for, in this case, HTTP and TCP. Then what it would do is it would take all of the elements of the request, none of which I knew conflicted with the names of state and logger and the done callback. I had to just have that expert knowledge as the person who architected the code to know I wasn't going to run into any naming conflicts, but it would add all of the elements like the request headers, the request body, the request query string, to the config object. While it was a parameter object that only had state and the logger and callback for most consumers, if you happened to have your code use the old function interface, it would also have all the HTTP request properties on it as well. It continued to work. That way, it was downcasting the modern code to the old version in a way that would support both old and new in a way that was guaranteed to not run into any naming conflicts.
Evolution Patterns - Hidden Interfaces
This next pattern is, I think, where things get really interesting and really explore the boundaries of what is a contract, and what is a promise that I hinted at early. Getting back to the shellTransform. I gave a little bit of a brief description of it. It allows you to build a shell program written in the language of your choice, that would receive the JSON encoded request and response. It would allow you to spit out a JSON encoded response. It allows programmatic transformation. If you were writing this in JavaScript, for example, the way it was originally published, your code would look something like this. The request and the response would be parsed as command line arguments to your shell program, voted the right way. You would have to interpret those in your code. That had all kinds of problems, especially in Windows. It has to do with the maximum length of the command line, which is actually more variable than I understood when I wrote this code between operating systems and shells. In Windows it's quite limited. It's maybe 1048 characters or something like that. Of course, you can have very heavyweight HTTP requests or responses. If you are inputting that JSON, and it's a 2000-character body, you've already exceeded the limit on the shell. That's the character limit itself.
There are also a number of just polling complexities to quote the JSON the right way and escape internal quotes for the different shells. I figured it out on Linux based shells. The variety of polling mechanisms on Windows-based shells, because there's more than one, you have PowerShell, you have the cmd.exe, you have the Linux Cygwin type ports, was more complexity than I realized when I went with this approach. What I had to do was have mountebank as the parent process, put these things in environment variables that allow the child process to read the environment variables, very safe, very clean. I don't know why I didn't start there from the beginning, but I didn't. That's the reality of API developments, you make mistakes. I wanted this to be the new published interface. Of course, I still had to leave this in there. I just removed it from the documentation. That's what I mean when I say a hidden interface. It's still supported, it's just no longer part of the published contract. If it worked, I think it's a reasonably safe way of moving forward. I downgraded stability a little bit. I think the reason is into that with the description I gave you of the character limitations of the shell.
What happened was by still publishing stuff to the command line, and this code down here was more or less the code that let me do it in this quoteForShell, manage the complexity of trying to figure out if they're on Windows and how to quote it exactly right. Unfortunately, even if you weren't using the old interface, even if you weren't using the command line interfaces, if your shell program was using the environment variables, it still introduced scenarios where it would break mountebank, because it would put the command line variables as part of the shell invocation. Sometimes in certain shells, in certain operating systems, that invocation would exceed the character limit supported by the shell itself. Even though you had no intention of using them, even though you didn't know they were being parsed, mountebank would throw an error, because it exceeded the shell limitation.
For a while, what I tried to do was say, let me be clever, and if you're on Windows, do this, if you're on Linux, do that. It was too much complexity, I don't know that I'm smart enough to figure out actually how to do it all. Even if I was, you would still run into edge cases. No matter how big the character limit of the shell is, there is a limit. It's possible to exceed that limit, especially if you're testing very large bodies for HTTP, for example. My first attempt was to just truncate it by shells, but pretty soon I realized that was a mistake, so I had to truncate it for everybody. This was a real tradeoff. I think, probably the pivotal moment in this talk, because there was no way for me to guarantee that I could do this without a version without breaking people. If I truncated it for people who were on a Linux shell that had hundreds of thousands of characters as a limit, and I truncated it for Windows, which maybe had 1000 or 2000-character limit. There may be people who used the old interface on Linux, post-truncation, that they would get an error. I was unaware of any. I had zero feedback that that was the case. It was certainly a possibility, even if it was somewhat remote. Because the way of publishing on the command line wasn't around for very long before it switched to the environment variable approach.
Releasing a new version would have been the safest option by far to satisfy all of the constraints around stability in that scenario. However, it would have also forced consumers to upgrade. It would have been very noticeable to consumers. They would have had to read the release notes, figure out what they need to change, and do around the testing associated with that. If alternatively, I took the approach that I did, which was to just truncate in all cases, publish only the environment variable approach, and rely on the fact that it was unlikely to break anybody. If it did, the error message would exactly specify what they needed to do to fix it until you switch to the environment variables. Then I was optimizing for the masses. I was optimizing for what would support most people in a very frictionless way with a clear path of resolution for what may be zero people who are affected by the breaking change.
How To Think About API Evolution
That's uncomfortable, because that forces us to rethink API evolution away from an architectural pattern that guarantees stability, to thinking about it as the users would think about it. I was really inspired by this thing called Hyrum's Law. Hyrum worked at Google. With a sufficient number of users on the API, it doesn't matter what you promised in the contract, because consumers will couple themselves to every part of the API. I remember for a while, Microsoft Windows, when they would update, they would have to add code to the updated operating system because they would test not just the operating system itself, but they would test third-party applications using the operating system. Third-party developers had done all kinds of very creative things with unpublished parts of the Windows SDK for a long time. Windows, as they changed these unpublished parts of the SDK, maybe we were doing something clever with this eighth bit that was unused in a byte, which was a real scenario that happened sometimes. They would have to detect that and write code in the new operating system that would continue to support the same behavior, even though there's never something that guaranteed.
Hyrum's Law
There's a famous xkcd comic out there where users are complaining about their Emacs was taking advantage of the fact that when you held the spacebar down, it overheated the computer to create some side effect. The developer was like, no, I just fixed the overheating problem. The Emacs user was like, no, can you change it back to the old behavior. Hyrum's Law is a really humbling law for an API producer. Especially as one who has had a public API available for most of a decade now, I really relate to how frequently I find myself surprised at how people have hacked an API to do something that I didn't anticipate they could do in a way that I wasn't intending to support, but now is oftentimes supported. Mountebank is primarily a RESTful API, but some people embedded it in JavaScript, and I never really meant to support it. Some people did that because it solves the startup time, it's just part of your website instead of a separate website. Now I have this accidental complexity of supporting a JavaScript API that you could embed in an Express application as well. That's an example of Hyrum's Law. Mentioned in this book, "Software Engineering at Google," which is why I put it there. I think I got a lot of value from some of the patterns of what Google's had to do to scale to 50,000 engineers.
API Evolution Is a Product Management Concern
We talk a lot about API as a product nowadays, usability, feasibility, viability being common descriptions of the tradeoffs in product management. I think that rethinking backwards compatibility evolution or breaking change evolution, from an architecture concern to a product management concern, is a much healthier position to think about how to manage the evolution of your API. I think that the tradeoffs that are represented by product thinking are more nuanced than the tradeoffs represented by architecture thinking. I think versioning is a very solid architectural pattern that guarantees stability in the case of breaking changes. There are always needs for that pattern. Mountebank itself has enough debt underneath it. One of these days, I would like to release a subsequent version that allows me to remove a lot of the cruft, a lot of the things I really no longer want to support, but have to because of some of these backwards compatible transformations that I'm doing.
If we think about viability, we're solving problems that our users have, an API context, I really liked the idea of cognitive load that the authors of "Team Topologies" talk about. When I think about any product, what I really want to do is simplify the underlying complexity, I really have no idea how my phone works. I have no idea how it connects to a cell tower. I don't understand the underlying mechanics, physics, material design. I barely understand the software. It simplifies an interface for me to be able to use it. Same as driving a car. It's a couple pedals and a steering wheel. We have mirrors in the right places. I can drive without having to understand the underlying complexity of the system that I'm driving. I want my APIs to do the same thing. Usability really has been the focus of this talk. How do I manage evolution to that system, or to that interface in a way that provides the most usable and stable experience for my users? Then, feasibility is very much an architectural concern. How do I do that in a way that is technically feasible, that protects downstream systems and it satisfies the non-functional requirements of the overall ecosystem at large? Rethinking API evolution as product management, I think, for me has been a pretty profound way of understanding the needs and empathizing with the needs of the consumers of mountebank. It's something that I'd recommend you consider as you're evolving your own API, versioning is always an option that you can reach for. Upcasting and some of these others, I think, would be valuable additions to your toolbox.
Questions and Answers
Betts: How do you hide the complexity and keep it from being too bloated?
Byars: A large part of that, in my context, was trying to centralize it. Almost all of the code in mountebank only knows how to respond to the newest interface that is documented and supported behind the contract. Most of the code doesn't have this legacy behind it. For upcasting, there's one hook in the request processing pipeline that causes compatibility module. That's where all the transformations happen that convert from old to new. The exceptions are downcasting. A few downcast calls have to be sprinkled in certain strategic areas of your code. That is a little bit of debt that I'd love to clean up someday with the new version. For most of the transformations, it's pretty straightforward.
Betts: There was a question about returning a string instead of other data types. That made me wonder, a lot of your patterns you talked about are how you handle changes to the request to support different inputs. How do you evolve the response that you give to the consumer?
Byars: I don't think there is a path that I see for the producer managing backwards incompatible changes on the response without a version. In fact, this is one of the driving forces that I would love to someday create a version for mountebank on, because there are some responses that I look on now, and it's like, "I wish I hadn't had done that."
Betts: Sometimes these changes happen, and you have to evolve because there are just new features you want to add. Sometimes it's a mistake in the original design. What drove you to make the change? Does that influence your decision?
Byars: Ideally, you're trying to be thoughtful in the API design to make plugging in new features an addition. That has been the norm. It's not universal. Generally speaking, that's an easier process. Sometimes covering up mistakes requires more thought on the API design change in my experience. There are simplistic ones where I really wish I hadn't created this endpoint or accepted a PR with this endpoint that has a specific name, because it doesn't communicate, where I'm really hoping that feature communicates to users. It actually conflicts with some future features that I want to add. That actually happened. What I did in that case was, there's a little bit of hidden features going on, and I just changed by addition. I created the new endpoint with a name that wasn't as elegant as what I originally wanted to. I just compromised on that because it was more stable. My criteria is less elegant, but more stable. I just accepted that. There's a tradeoff. Sometimes you can't get the API exactly the way you want because of the fact that you have real users using it. That's not necessarily a bad thing, especially if you can keep those users happy. Until there's an attempt at deprecating an old endpoint, create a new one I want to communicate, but still having a little bit of compromise in the naming of fields. Then, of course, some of these other patterns that you see here are other strategies that do require more thought, more effort than just adding a feature in most cases.
Betts: With your centralized compatibility module, how do you actually stop supporting deprecated features? With versioning you can delete the code that's handling version, whatever of the API, as long as it's in a separate module. Does this stuff live around forever?
Byars: Yes, I've never deprecated those features. As soon as I release something, and it's hard for an open source product sometimes to know who's using what features. I don't have any phone home me analytics, and I don't intend to add any. You have to assume that you're getting some users of that feature. The good news is that with the centralized compatibility module, especially with upcasting, which is most of what I've done, it's relatively easy to adjust. I've been able to take one of these other patterns that doesn't require too much fuss. Downcasting is the hardest. One of these days, especially for the response question that you asked, because that's where I have the most debt, that I haven't been able to use these strategies to resolve, I would love to do a version. That would be the opportunity to do a sweep through the code that I no longer want to maintain.
Betts: I'm sure mountebank v2 will be really impressive.
Byars: The irony is I did release a v2, but it was a marketing stunt. I looked at the [inaudible 00:48:21] spec and they say, if it's a significant release, you can use a major version. I felt pedantically validated with what they said. It was really just a marketing stunt, and I made sure in the release notes to say, completely backwards compatible.
Betts: There's no breaking changes.
See more presentations with transcripts