At QCon London 2024, Karthik Ramgopal and Min Chen described how LinkedIn changed the remote procedure calls (RPC) protocol for 50,000 production endpoints from Rest.li to Google's gRPC. The migration is ongoing, and the team plans to automatically migrate 2,000 services to gRPC bridged mode – without business interruption.
Rest.li defines RPC interfaces with the Pegasus Data Language (PDL). It generates Java classes from PDL, which use JSON over HTTP as the wire protocol. LinkedIn open-sourced Rest.li, but this led to only limited adoption outside LinkedIn. gRPC, on the other hand, is widely used. It started at Google but is now a Cloud Native Computing Foundation (CNCF) incubation project.
The auto migration to gRPC has two principles. First, an intermediate gRPC bridged mode is introduced to run Rest.li and gRPC side-by-side on clients and servers. So, the existing code continues to run without manual changes, while the new code can use gRPC. Second, custom schema translators and auto-generated interop bridge classes automatically convert PDL to gRPC proto schemas and vice versa. That's why gRPC proto schemas can be the source of truth, as any changes there automatically flow back to PDL.
Servers and clients can be migrated independently thanks to gRPC bridge. LinkedIn has developed a dry-run process to simulate the migration in a test environment to preemptively discover unexpected bugs. Errors there lead to Jira tickets. Once all traffic shifts from Rest.li to gRPC, LinkedIn plans to move clients off the bridged mode to native gRPC. This will eventually allow the servers to do the same and use native gRPC.
Because LinkedIn does not use a monorepo for its applications, it could not update all code at once due to PDL dependency. So, it built an orchestrator to manage the migration through computed dependency ordering. gRPC bridge mode is just a stepping stone for LinkedIn to go native. To go native, code modifications using an Abstract Syntax Tree (AST) were insufficient, so LinkedIn is planning to use generative AI code modifications for the future native migration. That approach has already worked for two other migrations at LinkedIn.
LinkedIn wants to deprecate Rest.li because it does not support streaming, deferred responses, or deadlines. It's also relatively slow and has insufficient support for non-Java clients and servers. gRPC, on the other hand, supports bidirectional streaming, deferred responses, and deadlines. It has excellent performance and broad programming language support. That translates into a lower infrastructure cost.
gRPC does not have some Rest.li features, such as inheritance, required fields, custom default values, type unions, and custom types. The translators turn these features into so-called "Pegasus Custom Options" in the gRPC proto schema. Later, the translators generate custom code for these options, such as checking whether required fields contain a value.
Access recorded QCon London talks with a Video-Only Pass.