BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Scaling GraphQL Adoption at Netflix: Tejas Shikhare at QCon San Francisco 2022

Scaling GraphQL Adoption at Netflix: Tejas Shikhare at QCon San Francisco 2022

At QCon San Francisco 2022, Tejas Shikhare, senior software engineer at Netflix, presented Scaling GraphQL Adoption at Netflix. Shikhare has been working at Netflix’s federated GraphQL platform, distributed systems, and, more recently, developer tools and education. This talk is part of the editorial track Modern APIs: Building and Evolving.

Shikhare started his talk with an introduction to GraphQL, an alternative to a communication protocol for APIs between clients and servers. In GraphQL, there is a schema that defines the data graph. There are three root types- query, mutation, and subscription, which are the entry points to the data graph.

GraphQL offers a few benefits: minimizing round trips to the server, strong typing, and acting as a visual presentation for your APIs across different domains. Shikhare summarizes GraphQL:

Simply put, GraphQL gives you the ability to fetch exactly the data you want from the server. Not more. Not less.

Netflix started their GraphQL journey with a dedicated server, named DNA API,  communicating to their microservices, a common pattern for companies with GraphQL servers. Netflix at the time, between 2012 to 2015, was using an internally developed tool called Falcor.

The DNA API grew more prominent with time and had a few issues. Code change was required in both the microservice and the API layer and often by different teams. Consequently, the API team has to be experts in many domains, be the first line of support, and frequently make code changes. This resulted in slow build times and cascading failures.

 

Federated GraphQL would resolve these issues by extending types across service boundaries and enabling each team to be able to implement their own parts of the API. In the example above, each service knows about movieId and hydrates the fields it owns.

In a federated GraphQL architecture, there are three components. First, a domain graph service(DGS) is responsible for implementing the subgraph. Next, a schema registry is responsible for validating each subgraph and merging them to compose a supergraph. Last, supergraph is exposed to clients via a highly available GraphQL gateway service. When a client writes a query to the GraphQL gateway, the gateway is responsible for breaking the query apart into subqueries and sending them into each domain DGS.

Although GraphQL and this architecture at Netflix today handle more than 1 billion daily requests, more than 10,000 types and fields, and more than 500 active developers, there are a few challenges with Federated GraphQL. Shikhare sites the following issues at Netflix:

  • Federation and GraphQL have steep learning curves.

  • In a federated graphQL architecture, multiple players frequently make changes to the schema, which can lead to inconsistent schema design issues. 

  • The graph becomes too big to collaborate. Naming conflict is a common problem, and namespace alone causes more harm than good. 

These issues lead to a central question: Although Federated GraphQL gives freedom for each team to move fast, is it allowing developers to be responsible stewards of the API?

To answer this question and solve these growing issues, Shikhare and his team came up with a workflow called Collaborated Schema Design. They created a few tools to facilitate and re-enforce workflow adoption.

  • GraphHub, is a schema collaboration tool to reduce collaboration challenges between the client and server team. GraphHub is a monorepo that has all the schemas and syncs with Schema Registry to have the latest schema from production. Since GraphHub is a repo, it allows any developer to make a proposal via a pull request.

  • Tangent to GraphHub, Shikhare’s team also created a Schema Working Group, that is open for anyone to join, to set and uphold standards.  

  • GraphDoctor, a schema linter, was created to help with consistent APIs in the massive multi-player environment. GraphDoctor listens to new pull requests and uses codified schema guidelines as linter rules to keep API schema designs consistent.

  • Graphlabs creates sandbox environments for each new pull request to create rapid prototyping and short feedback loops between client and server teams.

  • Graph Stats & Notifications to power deprecation workflow. Graph Stats & Notifications count and notify when deprecated fields are used.

Shikhare continues his talk by highlighting that there are more problems that he and his team are actively working to resolve, such as sharing types between subgraphs, working around the limitations of Federation, and passing context between subgraphs, authentication and authorization. 

Federation is not free. And it’s not going to solve all your problems magically. We had to build a lot of additional tooling, documentation, and developer education to make it work.

Shikhare closed his talk by offering a few recommendations if you want to adopt GraphQL in your organization:

  • Start with a monolithic GraphQL API and resource this effort to a single team, ideally with a mix of backend and UI engineers

  • As your GraphQL API grows, think about federations

  • Plan a coordinated GraphQL effort in your organization to avoid separate and isolated GraphQL APIs

  • Schema Design is absolutely table stakes. The amount of effort you invest in your schema design directly affects the success of GraphQL in your organization

  • Take a schema-first approach

  • Use deprecation workflow to create a version-less GraphQL API

  • Take a product-driven approach

  • GraphQL truly shines for consumer and device APIs, but is not meant for everything

Netflix’s engineering team had also given a talk on GraphQL Federation at a previous QCon plus. Other talks on Modern APIs: Building and Evolving will be recorded and made available on InfoQ over the coming months.

About the Author

Rate this Article

Adoption
Style

BT