Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Interviews John Sheehan on Web API Quality

John Sheehan on Web API Quality


1. I am Charles Humble and I am here at QCon SF2015 with John Sheehan, cofounder and CEO of Runscope. John, could you perhaps start by introducing yourself?

Sure. I am the cofounder and CEO of Runscope. We make API performance monitoring tools. We have been around for just under three years and we exist to help bring visibility to API problems so that developers can build better applications and better end-user experiences and ultimately better businesses. My cofounder and I met at Twilio in 2010, I worked on a lot of different API things there, between developer experience and scaling and actually building the APIs. I went from there to IFTTT where I worked on countless API integrations as well and I felt like the world needed better API tooling and that is what we set out to do when we started Runscope.


2. So are there critical things that could improve developer experience for API consumers but tend to get overlooked?

Sure. The number one thing that is missing when it comes to consuming APIs is full visibility into what is happening when your code actually goes out and tries to talk to an API. You know, a lot of our other code is very highly instrumented, it is easy to get inside to do debugging on the things that happen when it is just running on your machine or on a server that you can control. Once you introduce an API, you have added a whole other layer of complexity and potential problem spots for actually achieving the functionality that you set out to write. You have network problems, you are relying on somebody else’s code that just happens to be running on somebody else’s server, even though it is just as crucial that your user experience is the code that you write. So, what we are trying to do is to sort of extend out where the tools are, to bring visibility to those problems, because that was the number one thing that we kept running into at Twilio customers and during my time at IFTTT. It was how do we deal with incorrect data when we do not control the source of the error, how do we deal with slow network times latencies and that stuff, where we do not control the network where this data is coming from. So a lot of API writers have not been necessarily been investing in the visibility, to the levels that we thought were acceptable, so that was the angle we took. But it is nice that you start to see some of the top tier API providers start to raise the level of visibility, offering debugging consoles, offering more comprehensive logging tools within their dashboards and I think that is really what helps separate the top tier API providers from the run of the mill providers is focusing on visibility so developers can be more successful, more quickly.


3. How important is the role of a developer evangelist in improving the developer experience?

To me, developer evangelism has always been sort of the human face of the developer experience as a whole. So your documentation tends to be the single source of truth, the definitive place where people go for answers. If somebody runs into a problem, they may contact your a support team or they may talk to one of your evangelists. What you really want is your evangelist to reinforce the things that your documentation is telling them with a little bit more context and color and experience that helps somebody solve that problem better, solve whatever problem they are trying to solve. So if your evangelists are experienced and understand the range of problems somebody is likely to run into and they are consistent with the documentation then developers are going to get a much better experience and you are going to be much more successful, more quickly, because of that. So, I have always told all the evangelists I worked with that I have hired and had basically “You are the human face of our documentation. Make sure that you are doing what you can to make sure developers are being successful” Sometimes, whether or not that involves our API, our product or not, just generally help them so that you become a trusted source of information and overall you will become a part of who they go to when they have problems and maybe someday that will lead to somebody being a customer.


4. What do you think are the particular characteristics that make for a good developer evangelist?

I think that the number one characteristic that makes for a good developer evangelist is working experience. So when I go out in the field, when I talk to developers, I want them to feel that I can relate with their problems, that I have been in their shoes and that I understand that writing code is probably the easiest part of being a software developer, right? There is working with teams, there is shipping and scaling and understanding customer needs, figuring out how to ship software within a large organization, dealing with politics and products and all this other stuff. So, if I have been a working developer and I have just been through that, then our customer is going to have a lot easier time believing what I tell them about solutions. So it really comes back to being that trusted partner. So first is working experience and the second part of that is – if you cannot show empathy, then you should just not be a developer evangelist. Being able to put yourself in other people’s shoes is probably like the one high order scale that I just really look for, that has nothing to do with someone’s technical ability.


5. Apart from Twilio, what are some of your other favorite developer portals and why?

First of all, I love that this question starts with “apart from Twilio”. Twilio has established itself so much that no one wants to hear about it in an answer anymore. When I started there, there was just 10 of us and nobody knew who we were. That was definitely not the case. So it is amazing to see how far they have come. My favorite one recently is Clearbit which is a business intelligence and data API that tells you information about people and companies. Their portal developer documentation is really, really focused on visibility. You get really great logs for all of your API calls, you can try out the API from a portal entirely without having to write any code, you get a really good sense of what functionality is there and what you are willing to get back. The clarity of it and the attention to detail and design is really sort of unparalleled right now. So it is definitely one of my favorites. I think Stripe is also in the category of Twilio, having become the default reference material for what a good API developer experience looks like and one thing that is interesting about Stripe is that that dashboard and docs have not really changed much. The API has changed a lot, but the docs have not changed much over the past five years. I actually think developers appreciate that, that it is stable and it is predictable enough that it does need to be continually reinvented and that when you go start a new project, you are not going to have to relearn a new thing or understand how things work. It is consistent and long-lasting. It is what we want from our APIs and I think that is what we should sort of demand from the associated tooling with the API as well, that they act predictably and that they only change when they have to.


6. To that end, what are the KPIs that you recommend for effectively monitoring and measuring the APIs?

If you are producing an API, then the KPI is maybe different than if you are consuming somebody else’s API. But let us just assume that you are producing an API and you care about making sure that the developers consuming are getting the best experience. The KPIs I would be looking for are first of all just general uptime, data response, and how many times did it not respond. So we have had up time monitoring around it seems like forever. The entirety of my career going back 20 years now, I think I have seen various uptime monitoring tools coming up. But that is still the base line – if it is not up, it is not working. You cannot do anything with it. The second one we look at is: if it responded, how quickly did it respond? But we do not necessarily want “the first byte” or even to a single response. What we are looking for when we talk about performance monitoring is we want to actually monitor the performance of specific functionality and that might include chaining multiple calls together, calling different APIs and we try to encompass this functionality into things that are actually user-facing behaviors. For e.g. monitoring the sign-up flow which may have 10 API calls or a store look-up to see if an item for retailers are in stock in one of the stores, which may include dozens of API calls that actually make that happen. What we want to do is test the entirety of that functionality and make sure end to end that it is working. You know, there is a study, I cannot quite quote the source of it, showing that mobile users will abandon something if it takes longer than 300 ms. So if they try to enter something in the App and it takes longer than that, they are going to abandon whatever task they were after. So you have to make sure that all the API calls that encompass that task complete in under that time limit, maybe not just one end point and how quickly it maybe starts sending new data back. So we try to look at performance from a functional standpoint. And then the last part is, if it was up and it was fast, did you get the right data? So it is also not uncommon for an API to return incorrect data, so what we want to help people figure out and what we would see them be the most successful with, is not just measuring the number of times that they have successfully validated the structure of the response, but also the content of that response. You can get a fast, well structured response that contains incorrect data and as far as your end-user is concerned, that is just as bad as being down. It is actually worse than being down. The complaints that we got at IFTT about incorrect data showing up in people’s apps were way stronger than “this is down and unavailable right now”. So uptime, performance and correctness are like the three areas that we look at. Uptime, basically minutes of down time is the KPI there. Performance, basically the number of functional regressions and then for data validation, the number of times the data was properly structured and also the content of that message was correct.


7. Great. Let’s go a little bit deeper in that. What are some of the challenges for monitoring and testing API successfully?

A big challenge for monitoring and testing is sort of spanning the entirety of the API lifecycle. You tend to have API developers, who are building a new API to some spec, maybe it is on their local machine, maybe there is sort of a Dev version of that API that they are iterating on and the tools tend to be very siloed for that specific group of people. Once it sort of graduates out of that initial Dev phase and you are going to the acceptance phase and your QA team takes over and they start testing for more edge cases and doing load testing, that sort of stuff, they tend to write their own tests and live in their own silo as well. Then, by the time you go live with that API, you have an ops team who is now trying to meet different KPIs, maybe it is just the uptime, maybe it is just the actual speed KPIs and so they have to create their own tools and their own checks to make sure that they are meeting those metrics as well, but really all those concerns are inextricably linked. You cannot separate the functionality from the speed, from the other concerns of the team. They are all one concern, but the tools tend to be very siloed by those teams. So a big challenge is working across teams. How do you make it so that when your Ops team discovers a condition in production that is causing the API slowness, that that feedback is brought back to the Dev team, is brought back to the QA team so that they can write better tests so that they can improve their codes, so that the condition that did show up during Dev time does not reoccur on that next iteration of the API in production. So we very heavily built our tools to try to bridge the gaps between those teams and to help the entire team to work together on these performance characteristics. We have a long way to go, but I think we have seen really early signs that when you can get those people to work together on the same sets of tests and data that you can really get a really positive feedback cycle going for API quality.


8. How does the design of the API impact how easy it is to monitor and to test?

Simpler APIs are simpler to test – that sort of sounds obvious. But I think people have a really hard time distilling APIs down into essential functionality or individual units that make it easy for a developer to understand. But if you do that well, then you are going to have naturally more success with the testing and monitoring. Ultimately, somebody is either going to consume this API to power their app, or somebody is going to consume the API in a monitoring or a testing scenario and if the API is equally simple for both of those cases, then no one will be blocked from getting that done. So I would say any chance you get to keep your API simpler, is going to have benefits beyond just making your monitoring and testing simpler and also help the developers who are consuming the APIs as well.


9. Once your API is monitored and quantified and so on, how do you help API providers improve their KPIs?

This sort of goes back to the full life cycle, that sort of ideal that we are shooting for, right? I think that the biggest thing is enabling collaboration between those teams. It is one thing to actually getting them together on the same set of data, but how do you give them tools to react to errant conditions, to relay that back to the teams, to represent that to each other and to get a history of some of those things so that developers, when they encounter a new problem, they can go look up what is our history on similar problems. I think that collaboration tools across team boundaries is really, really nascent right now and a huge opportunity for improvement. You are seeing a lot of new tools spring up into place and we want to encourage that and we love that other tools are doing that as well, but it is amazing that it really comes down to communication like almost every other software development problem. If you can get more people talking to each other and giving them good, accurate data to sort of base decisions on, then you will start to see the quality improve pretty quickly.


10. What is the difference in strategy for monitoring API versus monitoring microservices?

The biggest thing is that most microservices are on private networks so they are not publicly exposed out to the internet. That make it trickier to do something like globally texting latency from around the world because if it is in a single AWS VPC or on your own hardware somewhere, giving a more complete picture on microservices can be tricky. We very heavily promote our hybrid on a premises approach – I got a little marketing there for a second- but it is essentially allowing people to drop agents within their infrastructure and to see that data and to try to split it out within their infrastructure so that they can simulate clients, whether they are external or internal, using the same sets of tests and tools as much as possible. We have had some customers have success with that, we got a great story from Omnifone on our website that shows how they used our agent to better test their microservices, but the biggest thing is network access and how your tools access and react to the different environments that those different APIs live in. Ultimately, a public API and a microservice API that is using http have very similar interfaces and I think that getting the performance uptime and cracking this beyond just the access control is pretty standard between the two


11. Does the scale of microservices have an impact on complexity?

Yes. A little bit because when you break down your infrastructure into a lot of smaller pieces, any one piece of functionality tends to have a lot more dependencies than if you had bigger APIs or a single API. So within Runscope we have 70 internal services and so introducing some pretty hairy dependency chains within that. So part of what we are trying to do when we test those interfaces and those service boundaries is to make sure that all the dependencies are working as well. So we try to put as much testing in on any service individually to make sure that we are meeting individual service contracts, but we also have to do test APIs that aggregate across multiple services in a way that we do not with our public API. So what we are striving for is to make sure that our tools can live at any level of granular, if it is just high level availability or if it is low level, high volume testing of many, many end points in many, many services and somewhere in between with the aggregation layers, we want to be able to live in any one of those layers, no matter where there is a service boundary.

Charles: OK. Changing tack a little bit, we have had quite a few companies in the news recently who have been shutting down all deprecating public APIs. So I wanted to get your perspective on the API economy and basically where you thing it is going.

Sure. I think there was a similar over exuberance about the potential for open APIs and platforms out there and I think that as an industry as a whole, we have since adjusted our direction based on failures and successes and seeing what works and what does not. You have companies where the entire product is an API like Twilio, like Stripe, like Sendgrid service that is doing great and is continuing to drive increased usage, there is no threat of any of these companies shutting down their APIs any moment now. Every API call directly benefits both sides of that equation. So you call Twilio, you get benefit to your users, they get money for sending a message and I think that is what the model that we are now trying to bring to other areas. So consumer apps where they are having a lot of shutdowns or deprecations of APIs did not necessarily have that mutually beneficial API transaction, right? But now you see new services come along, - like Uber has an API and every Uber API call generates revenue for Uber because it makes a car show up and bill somebody and the developer gets to add additional functionality into their applications and so again, you get both sides benefiting from that API call and I think that is going to be the model going forward. Every API call you make should benefit the API provider as much or more than it benefits you and that is how you know that that API is not going to go away and can be trusted for you to depend on in your business. So if that is not present, then you should be talking to some business development departments to see what is possible to make sure that that is the case if you want to be dependent on someone’s API.

Charles: Well then, I think that is a great place to end. Thank you very much.

Thank you for having me.

Feb 27, 2016