Untangling an API-First Transformation at Scale. Lessons Learnt at PayPal – Part 1
- Implementing an API-first transformation in a large organization is as much a people problem as a technical one
- Organizational chutzpah will strongly influence your API-first success strategy
- Treating the transformation itself as a product promotes long term success
- Process and governance are necessary, but keep it light and make it work for your customers
- Don’t underestimate the investment in tools, infrastructure, and people required to make it work
Getting Your API Strategy on
It wasn’t that long ago the idea of exposing your core plumbing externally was verboten. The real products were the apps and experiences you built on top of that internal stuff. Most product investments resulted in vertically integrated solutions that served a relatively limited set of functionality. Leveraging that investment for a totally different set of use cases or a different product line was often somewhere between difficult and impossible. In that context, it was natural to think of the underlying infrastructure as sunk cost or internal-only plumbing. It was hard to imagine it differently.
That’s changed. More or less every company now has an API strategy that strives to leverage a much larger portion of its technology investment by exposing the outcome in the form of reusable services. The high-level goal is to accelerate business agility by making core business capabilities easily accessible and reusable. Reducing the cost of integration not only has a positive impact on internal velocity, it also provides for increased flexibility and speed incorporating customers, partners, and acquisitions. The days of monolithic, vertically aligned mudballs are rapidly coming to the end. What’s now generically referred to as “microservices,” exposed via nicely designed and documented APIs, with a clear bounded context and semantically distinct business value are where we’re going.
The API Economy
Why the change? The emergence of the API economy has changed the rules.
Some of the highest value companies in the world - Facebook, Google, Amazon - all figured out a while ago that opening the kimono and providing external developer access to more of the plumbing of their core products was a great way to leverage their tech investment, reinforce their product portfolio, and expand partnership opportunities. They built cultures that reinforced share and re-use principles, which were key to making this possible. This catalyzed the API economy and enabled many, many new products to be built and market-tested faster and more cheaply than before. API providers reaped the benefits of more customers, more traffic, deeper integration, and, at least for some, the seeds of new, multi-billion dollar businesses. Even Twitter owes much of its early success to an API strategy even if they subsequently pulled back from it.
The second wave - companies like Twilio and Stripe - skipped the traditional “product” part altogether. They made the APIs themselves their core product and focused on providing great developer experiences as their primary mission. Their customers quickly integrated and got the benefit of essential, needed capabilities without having to build and maintain a lot of non-differentiating functionality themselves. Agility improved. Hackathons happened. Apps and start-ups exploded. It became the default way to build new experiences, iterate, and verify product-market fit.
To remain competitive as an established player your organisation also needs to adopt an API-first approach. It is a nontrivial change to how you look and think about your tech portfolio.
Actualizing Your Future
For newer companies, this approach is already second nature. They’ve likely grown up inside the new reality. But what if you are a large, established company with a lot of existing infrastructure and customers? How do you go about making this transition while still running your business?
First, your tech landscape probably looks quite different. Your legacy code base - probably influenced by SOA, but developed before “microservices” was even a word – is likely sub-optimal. You may have built a lot of tech debt over the years and you may not have a culture that natively reinforces the principles of clear componentization and reuse. Testing may be spotty and there are likely skeletons left over from various re-orgs and abandoned projects. Documentation may leave a lot to be desired. This is not a nice, level foundation on which to start a major microservices transformation. There are, as we say, a lot of “challenges”.
The other major consideration is organizational chutzpah. Executives may be sold on the need for an API-first strategy, but their understanding of the commitment required to get there may vary. Understanding where you are on this spectrum is very important to your success.
On one end, you get the Manhattan project. The bus stops, everybody gets 100% with the program, it’s the top priority, and almost everything else is put on hold. This is rare. If you find yourself in this situation, run with it. Treat it as a massive project and move quickly. You’re not likely to get a second chance.
More likely, practical reality puts you somewhere in the middle. There may be a commitment to slow the bus a little, but it’s definitely going to keep moving and the wheels will need to be changed along the way. As a leader of the transformation, this feels a lot messier. It raises a lot of difficult and important questions that don’t have obvious answers. How is it going to be prioritized? Which teams are going to do it? How is it going to be coordinated? When will it be done? Your job just got a lot more complex. It’s no longer a project. It’s now a complex transformation that may continue for years and it’s as much about organizational change as it is about a technology transformation.
Put on Your Product Hat
At PayPal, we’ve been on this journey for over three years. We’ve used a customer-oriented approach to go from a very monolithic, siloed architecture to a much more loosely coupled set of over 150 services with well designed, modern APIs.
There is a tendency to treat programs like this as engineering projects. We didn’t. Instead, we framed this as a larger, organizational change problem with the “product” being a fundamental shift in how we design and build APIs. Viewed that way, we were forced to identify and serve all key “customers” – developers who build and consume APIs as well as the executives that support them. It shaped our strategy, the tooling we built, how we communicated, and how we measured success. Identifying your customers and focusing on their satisfaction is a fundamental product principle. This mindset has been key to our success and it continues to shape how we manage the program today.
While no two companies are the same, many of the lessons we’ve learned and approaches we’ve used are applicable to other organizations on the same path.
To break this down, we’ve organized the effort using a framework we call the “3P’s”.
- Product – the infrastructure, standards, and tools used to manage the portfolio of APIs and underlying service implementations
- Portfolio – the catalog of business capabilities, represented as APIs and underlying services
- Program – the metrics, training, and levers used to incent changes in organizational behavior and technology investment
This article is the first of three that explores these dimensions in more detail.
In a small company of a few dozen to maybe 50 or 60 developers, it’s not that hard for the key stakeholders to get in a room, hash out a plan on a whiteboard, divide responsibility, and get to work. There will be a pretty good, shared understanding of the plan and goals, and, after some iteration and course correction, you’ll likely end up with a nicely integrated and cohesive set of APIs and underlying services representing your platform. Your timeline is probably represented in months, or maybe quarters.
In a large organization, things are different. You may have hundreds or thousands of developers spread across multiple business units and geographies, all with differing objectives and, often, different tech stacks. For many reasons, not the least of which is Dunbar’s number - you will not have a Kumbaya moment where everybody holds hands, agrees on the goal, and gets to work. What you need is something more distributed, more scalable, something that reinforces your objectives without depending upon the frailty of social contracts or large scale project coordination. In this context, the “Product” is actually the infrastructure that supports the transformation (not the APIs themselves). This is a significant investment in and of itself. At a minimum, it should probably include the following:
- A playbook defining common principles and standards for APIs
- An engagement process to ensure compliance when developing APIs
- A means of governance including clear and objective measurement
- Some kind of centralized portal to consolidate APIs and manage the process and metrics
Principles and Standards
At PayPal, we started off thinking about what common ideas would guide us towards our goal of nicely encapsulated services that exposed well-designed, reusable APIs. These became our core principles and they framed much our subsequent standards documentation.
- Loose Coupling - Services and consumers must be loosely coupled from each other.
- Encapsulation - A domain service can access data and functionality it does not own through other service contracts only.
- Stability - Service contracts must be long-lived with a predictable lifecycle.
- Reusable - Services must be developed to be reusable across multiple contexts and by multiple consumers.
- Contract-Based - Functionality and data must only be exposed through standardized service contracts.
- Consistency - Services must follow a common set of rules, interaction styles, vocabulary and shared types.
- Ease-of-Use - Services must be easy to use and composable by consumers.
- Externalizable - Services must be designed so that the functionality they provide is easily externalizable.
From there, we settled on REST/JSON as the primary interface standard and developed a comprehensive style guide to help ensure consistency across the hundreds of Scrum teams who may build services in the organization. This included things like URI structure, header usage, status codes, query params, versioning, resource naming, security, logging, error handling, hypermedia, etc. We’ve published key elements of our style guide here.
Something else we had to settle is what format to use for API contract documentation. We wanted all service developers to use the same format to document their APIs so that we could leverage common tooling and infrastructure.
When we started, the API market was much less mature and there wasn’t really a de-facto API documentation standard. In the end, we adopted Google Discovery Document (GDD), as it seems like a) as good a choice as any, and b) better than most. From a community support and adoption standpoint, this promptly went nowhere. We ended up developing quite a bit of tooling and support for GDD before it became clear around mid 2015 that OpenAPI (fka Swagger) was where most of our developers wanted to be. Around that time, we made the decision to migrate all our APIs to the OpenAPI Specification and we joined the OpenAPI Initiative (OAI). It’s already started to pay dividends by reducing our infrastructure investment, being generally more robust, and our developers are happier working with it.
When you say the word “governance”, most people have a reflexive reaction – and not a good one. This feels like a Big Company term that is synonymous with “bureaucracy” and “slow”. People want to run away. The practical reality is some kind of process is unavoidable if you’re trying to enforce standards and consistency in a large, highly distributed organization at scale. It’s important to realize that this isn’t just a technology problem. That’s relatively easy compared to the really hard problem, which is changing human behavior. Good intentions are fine, but some form of non-optional process is needed to reinforce the outcomes you want. That said, it’s really, really important to make the process as simple, lightweight, and fast as possible to minimize friction and maximize satisfaction.
The basic engagement process we developed looks like this:
The process begins when the team building an API submits a proposal outlining their plans, included use cases and, typically, a proposed design.
- Alignment – a central team, in collaboration with the domain architect and product lead, determines the name, namespace, resources, etc. The goal is to fit the API within the context of the larger portfolio and ensure it “makes sense” from an outside-in perspective (more on that later).
- Review – the team documents the API using the standard format and a cross-domain team of API design experts provides feedback and suggestions on how to improve usability, consistency, and how to meet expected standards. This typically goes through several cycles of refinement.
- Score – designated API design committers score the API design against a set of canonical criteria that reflect design standards.
- Verify – API owners verify their reviewed and source-controlled API contract matches their underlying service implementation. They use API conformance tooling to compare request/response samples from their CI job to the versioned API contract and generate a score.
Post verification, the underlying API service implementation is deployed and the lifecycle state of the API is updated appropriately. Customers now know which version of the documentation is relevant to their integration.
It’s worth pausing for a second to reiterate the importance of customer satisfaction in this process. It’s an area where we fell short in the beginning. What we found is while we did a good job designing the process, we were pretty unprepared to operate it at scale. We radically underestimated the infrastructure required and we had way too many manual steps. It took too long, customer expectations were not set, and every problem and every delay became the fault of the process. There were a lot of complaints and we needed to turn things around quickly or risk derailing the entire effort.
The solution was multi-faceted and included the following:
- Double down on automation. Many of the steps required manually moving things from one state to another, updating data, and sending out notifications. Through automation and self-service, we drastically reduced the “monkey-work”, improved responsiveness, and made customers much happier. This isn’t something you can manage in a spreadsheet.
- Set realistic expectations. Imposing a new process on developers, by definition, is new work. Developers needed to understand that API review was something they needed to bake into their plans and account for. They also needed to understand when to do it in their software development lifecycle (hint: as early as possible). Broad outreach and training helped set realistic expectations and improve satisfaction.
- Listen and react quickly. You won’t get it right the first time, so talk to your customers and show them you care by making changes quickly. This builds trust and reinforces their dedication to the process the next time they engage. Most developers want you to succeed if they understand the broader vision. They’ll support you and be patient, but you owe it to them to listen and repay that trust.
We managed to drastically reduce customer complaints within about a quarter and we continue to iterate on all these facets to this day. Having better infrastructure support at launch is probably the number one thing we’d do differently if we had to do this again. Number two is spending a lot more time in marketing and outreach to educate teams, set expectations, and listen to feedback. As with any product launch, it’s better to delay the start a little if it means giving your customers a great experience and creating a positive feedback loop.
Large companies tend to be very metrics driven. Every product and program tends to get boiled down to a metric or two and a status on a dashboard. To succeed, you need to think carefully about what you measure, how this relates to the business value, and how to set the right milestones along your journey.
In an API-driven transformation, the two high level business benefits are reduced integration cost and increased business agility. Refactoring your business capabilities into nicely designed and documented APIs makes it easier to integrate – internally and externally, with partners and customers – and it allows you to quickly recombine functionality in different, innovative ways. This improves your time-to-market and increases efficiency by minimizing redundant investment and expanding addressable markets. Unfortunately, both of these benefits are very hard to directly attribute and measure. The best you can probably do is find an indirect metric that, through consensus, can be correlated with these broader benefits.
The way we approached this is to develop a maturity model that represented a sort of quality score for how close APIs are to the ideal. The “ideal” in this case was a fully encapsulated service that followed all API design standards and performed well. The idea is that the business benefits will be best realized when the majority of the business capabilities are exposed through high maturity API services.
Using our standards as a starting point, we created criteria and placed them on a scale from 1 to 5. We also added criteria that measured whether an API fit well within the broader business capability portfolio and encouraged certain operational attributes through inclusion of criteria for SLA compliance, test coverage, etc. By versioning the maturity model and manipulating where each criteria sat on the scale, we’ve been able to progressively march the API portfolio towards a more ideal state. We measure the maturity score of each API based on the lowest scoring criteria that failed. This enables us to roll up the overall quality of the API portfolio using a simple maturity metric, normalized across all services.
This image is a snapshot of our scorecard that shows a maturity assessment for an API. Each image represents a criteria derived from the standards we want to enforce. Developers use it to quickly understand which criteria passed and which failed. They click on each, drill into the details, make corrections, and update their score until they hit their maturity goal. Over the last few years, we’ve evolved both the model and infrastructure. We now support multiple, simultaneous versions of the maturity model, which allows us to evolve and refine our standards over time without breaking the maturity level of pre-existing APIs. APIs can upgrade to new model versions as they iterate and release new interface versions. We’re now in a relatively mature state that gives developers the feedback they need - on-demand and at scale - to continuously improve the quality and consistency of their APIs.
This piece of infrastructure may be the most important. Given the scale of a large organization, you really need a central place to consolidate all APIs, all documentation, the engagement process, and the overall program status and progress. This ends up becoming the internal developer portal and it’s critical to communicate with and coordinate the actions of the organization.
A really interesting thing happened at PayPal when we started. Prior to launching our site, we had developed the bulk of our standards and documentation on an internal wiki. Nobody paid attention. It was lost in the sea of disorganized and updated documentation that tends to happen in any long-term corporate wiki.
As an experiment, we decided to launch a simple web site with mostly static content that was 95% the same as what was already available on the wiki. We gave it a decent visual design and a first class URL on the intranet. We spent a week building it. Almost overnight, things changed. Executives started pointing to the site as, “the future of PayPal”. It showed up in presentations, we starting generating traffic, and people started taking it seriously. All we really did was change the packaging, but perceptions changed radically. We thought it would make a difference, but the magnitude really surprised us. Suddenly, we had credibility and some momentum to build upon.
It was a turning point. It was also a moment that reinforced how important it was to think about the program from a product, not just a technology standpoint. In the end, what we’re trying to do is change the behavior of people. The technology transformation is really just a byproduct. To do so, you need to stay focused on your customers and think from their perspective. It’s a lesson that stuck with us and we’ve evolved the site and the program into something much more customer-focused and robust. It’s now one of the most visited internal apps in PayPal. Among other things, the portal provides:
- Inventory of all APIs including interfaces, documentation, samples, metrics, ownership, a feedback mechanism, and maturity assessment details.
- All standards documentation, versioning information, and guidance.
- Lifecycle state management of versioned API contracts.
- An engagement process to propose and track status of new APIs throughout the development lifecycle.
- Program level reporting and dashboards on maturity levels and progress towards milestones.
This investment hasn’t been trivial, but it’s been absolutely essential to operationalize the goals of the program.
In the next article, we’ll go into how we used this infrastructure to help shape the API portfolio itself. A main goal of which was to establish clear boundaries between APIs that encourage decomposition of the monolith in a predictable, useful way.
About the Author
Erik Hogan has been learning what it means to be a great Product Manager for almost two decades. Much of that time has been spent understanding how to be a customer champion and trusted leader who can scale. He derives great satisfaction factoring complex problems into reusable parts and giving engineering exactly what they need to execute quickly. Lately, he’s been applying fundamental product concepts of simplicity, storytelling, and focus to affect organizational change - a very different type of "product". Also, his wife still gets mad when he asks for her success criteria.
BTW, we always hear about FB, AMZN, GOOG as examples ... there are others that have done equally good. Hope one listens to what is happening outside of US / Business and learn from that too. One of the best examples is the social platform called Aadhar (foundation) by India Govt. Its goal is to provided unique ID for 1 Billion + people. It achieved the goal in less than 10 years (faster than WhatsApp, FB et al?). Now the number of applications / business use cases being built on that is mind boggling. This is because they had the vision of going the API-first route (even though it is not needed). This could be done because an ace team with start-up culture was created by Nandan Nilekani (Infosys co-founder, and the inspiration behind the World is Flat book).
The world Governments are trying to take a leaf or two from the success. I am pretty sure startups with business goals can also learn from them.
Couple of pointers:
1) Book - Rebooting India: Reliazing a Billion aspirations