BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations GraphQL Caching on the Edge

GraphQL Caching on the Edge

Bookmarks
38:24

Summary

Max Stoiber discusses why and how to edge cache production GraphQL APIs at scale.

Bio

Max Stoiber is co-founder of GraphCDN. Previously, he worked at Gatsby, and before that GitHub, who acquired his last startup. He's most widely known for creating open source projects used by millions of developers, including styled-components and react-boilerplate.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Stoiber: My name is Max Stoiber. I am the co-founder of GraphCDN, which is the GraphQL CDN. If you are in the React community, in the ReactJS community or in the JavaScript community more generally, you might have used some of the open source projects that I helped build, like styled-components, or react-boilerplate, or micro-analytics, or a whole bunch of others. I'm really active in that scene. If you're there, you might have used some of those projects as well.

The Story of GraphCDN (2018)

The story of GraphCDN and how we got there, started in 2018. At the time, I was the CTO of another startup called Spectrum. At Spectrum, we were building a modern take on the classic community forum. Essentially, we were trying to combine the best of what phpBB gave us 20 years ago, with the best of what Discord and Slack give us nowadays. That was essentially the idea. It was a public forum, but all of the comments on any posts were real time chat. We tried to take these two worlds that are currently very separate where communities in Slack and Discord write lots of messages, but none of them are findable, and make them public and a little bit more organized so that you could find them afterwards, on Google or elsewhere. I tried to combine those two worlds together. That actually worked out surprisingly well, which led to quite a bit of user growth. As you can imagine, with all of this user generated content, lots of people found us on Google and elsewhere, and started visiting Spectrum quite regularly. That meant we had quite a bit of growth.

Unfortunately, I had chosen a database that wasn't very well supported. I'd chosen RethinkDB, which nowadays doesn't even exist anymore. The company behind it shut down after a while. I'd chosen that database originally, because they advertised themselves as the real time database. Their key feature or the thing they praised externally was that you could put this changes key at the end of any database query, and it would stream real time updates to that database query to you. You could listen to changes, to practically any data changes, which felt like a fantastic fit for what we were trying to do. Because obviously, almost anything in Spectrum was real time. The post popped in in real time. The chat was real time, of course. We had direct messages, which had to be real time. This felt like a great fit for what we were trying to do. Lesson learned in hindsight, rely on the databases that everybody uses. There's a reason everybody uses Postgres and MySQL, and now Mongo. There's a reason those databases are as prevalent as they are is because they work.

I'm a lot wiser now, I wasn't that wise back then. It very quickly turned out that RethinkDB, the real time nature of it didn't scale at all. We had hundreds of thousands of users every single month, but RethinkDB couldn't even handle 100 concurrent change listeners. As you can imagine, every person that visits the website starts many different change listeners. We're listening to changes of the specific posts that they're looking at. We're listening to changes of the community that the post is posted in. We're listening to new notifications. We had a bunch of listening as per user. Essentially, our database servers were on fire, literally on fire. Thankfully, not literally, but they were crashing quite frequently. I Googled servers on fire and found this amazing stock photo of servers on fire, which if your data center looks like this, you have some really serious problems. Ours weren't quite as bad, but they were still pretty bad.

We had this database that didn't scale. Essentially, we had to work around that limitation. We wanted to switch to a more well-supported database, however, that's a lot of work. Rewriting the hundreds of database queries we'd written and optimized up to that point, migrating all that data without any downtime, that was just a whole project. We wanted to get there eventually, but we needed a solution for us crashing literally every day, right at this moment. As I was thinking about this, of course, I realized that caching, we had an ideal use case for caching because our API was really read-heavy. Of course, it's public data, lots of people read it, but not as many people write to it. Actually, we had an ideal use case for caching. We'd originally chosen GraphQL for our API, because we had a lot of relational data. We were fetching a community, all the posts within that community, the authors of every post, the number of comments, a bunch of relational data, and GraphQL was a fantastic fit for that use case. It worked out extremely well for us, and we really enjoyed our experience of building our API with GraphQL.

The one big downside that we ran into was that there weren't any pre-built solutions for caching GraphQL at the edge, which is what we wanted to do. We wanted to essentially run code in many data centers, all around the world. We wanted to route our users to the nearest data center and cache their data very close to them for a very fast response time, but also so that we could reduce the load on our servers. If you've ever used GraphQL, then you know that that is essentially what GraphQL clients do in the browser. If you've heard of Apollo Client, Relay, URQL, all of these GraphQL clients, what they are is essentially a fetching mechanism for GraphQL queries that very intelligently caches them in the browser for a better user experience.

How GraphQL Clients Cache

In my head, basically, the question I wanted to answer was, can't I just run a GraphQL client at the edge? GraphQL clients do this in the browser, why can't I just take this GraphQL client that's running on my local browser, put it on a server somewhere, and have that same caching logic but at the edge? To answer that question, I want to dive a little bit into how GraphQL clients cache. If we look at this example of a GraphQL query which fetches a blog post by a slug, and it fetches its ID, title, and the author, and of the author it fetches the ID, name, and avatar. There is one magic trick that makes GraphQL caching really great, and that is the __typename meta field. You can add that to any GraphQL object, in your query, you can add that to any object type, and you will get back the name of the type of the response. For example, with this query, we will add type name in these two places for the post, and also for the author.

When the origin responds with the data, the response will look something like this, with the important piece being that now we have the post data, and we know that the type that was returned there was a post. The same thing for the author. We got the author data, and we also know that the author is a user. When we take this response, and we store it in our cache locally in the browser, we can now associate that cached query response with those two objects. We can target with post with the ID 5, and user with the ID 1. That's fine. We've just taken this query response, we've put it in the cache. We key that by the query that we saw, so by the get post query. Anytime we see the same query, we return that same data. Why are these tags relevant? Why do I care that this contains the post with the ID 5 and the user with the ID 1? This is where the magic comes in. GraphQL also has something called mutations, which are essentially just actions, anything that changes data needs to be a mutation. For example, if we had a mutation that was called edit post, which edits a post. In this case, we're editing the post with the ID 5, and changing its title. Any mutation also has to fetch whatever it changed. In this case, we're getting back the post.

Again, we can do the same thing we did for the query and add the __typename field to the response. When that response comes back from the origin to our client, the client can look at this response and go, we just sent a mutation to the origin, that mutation has come back from the origin, and the data that was returned was the post with the ID 5. I actually have a cached query response that contains that post with the ID 5, and I can now automatically invalidate that cache query result that contains the stale data of this post. That's amazing. This is what GraphQL clients do under the hood. They do this magic invalidation based on the __typename field and the ID field. Then they combine them to invalidate any stale data that has been changed at the origin.

There's one slight edge case here where the magic ends, which is list invalidation. If you imagine a query that fetches a list of blog posts, in this case, just their ID and title, when we look at the response to this query, it's an array that just contains the one blog post that we have right now, the post with the ID 5, how to edge cache GraphQL APIs. A mutation that creates a new post now poses an interesting problem, because of course, the response to this createPost mutation will look something like this. It will return an object of post with the ID 6. Of course, our cached query result for the post list doesn't contain the post with the ID 6. That's really annoying because that means that GraphQL clients can't automatically invalidate lists when new items are created.

Thankfully, they found a good workaround for this, which is manual invalidation. Essentially, GraphQL clients give you different APIs to manually influence the cache, and change it depending on which things pass through it. For example, with URQL, which is the third biggest GraphQL client, this would look a little bit like this. You could tell URQL that when the createPost mutation passes through the GraphQL client, invalidate any cached query result that contains the posts query, that contains the list of posts. That way, we can automatically invalidate that, no problem. Whenever a post is created, our GraphQL client will automatically refetch the fresh data from the origin.

GraphQL clients actually go one step further, and they do something called normalized caching. If we go back to our original query of fetching a single blog post, its ID, title, and its author, then, rather than taking the entire response of the post with ID 5 and the user with the ID 1 and putting that entire thing keyed by the query into the cache. They actually take each object within the query response individually, and store that individually. Inside of URQL's cache, this looks a little bit like this, where we essentially in the cache store, ok, the post with the ID 5, corresponds to this data, and the user with the ID 1 corresponds to this other data. Why do we care to do this? Because now, if a query comes in that, for example, fetches the user with the ID 1, then the cache can go, you're fetching the user with the ID 1, although we haven't seen the specific query before, we do actually have that specific data in our cache. We can just serve you that on the client without you having to go to the origin to fetch that data, again, because we've already fetched it. It was just deeply nested in some other query, but we've normalized that for you and can now give you the user data for the user with the ID 1, no problem, just like that. Which is very nice and actually makes for less network traffic and a much nicer user experience, because things will resolve much faster, since they're already on the client and loaded. You essentially only ever fetch every object once, which is fantastic, particularly if people navigate around your app quite frequently.

The one thing that's missing here that you might have noticed is the post.author. We have the post with the ID 5 data and the user with the ID 1 data. How do we know that the post author is the user with the ID 1? URQL stores that in a separate data structure that looks like this, which essentially just talks about the relations or the links between things. Here, we're essentially saying, if you're fetching the post with this specific slug, that corresponds to the post with the ID 5. If you're fetching the post with the ID 5's author, then that corresponds to the user with the ID 1, and then the user with the ID 1 doesn't have any further relations or links that you can go into.

What I really want you to take away from this section is that GraphQL is actually awesome for caching. It's actually really good for caching because of its introspectability, it tells you what data you're returning. This introspectability combined with the strict schema where you have to return something that matches that schema means it's actually really good for caching. That's also a lot of the reason why so much great tooling has spun up around GraphQL. It's gotten such wide community adoption that if one person builds tooling for it, because it's always the same GraphQL spec that it has to follow, everybody else gets to benefit from that tooling. That's incredibly powerful.

The Edge

To get back to my original question that I posed way back in 2018, can't I just run a GraphQL client at the edge? Can't I just take this logic that Apollo Client, Relay, and URQL have internally anyway, take that same code and just put it on a bunch of servers around the world at the edge so that everybody that uses Spectrum everywhere gets super-fast response times and we get to reduce the load our server has to handle massively? The key to the answer of this question lies in the last part, the edge, because as it turns out, GraphQL clients are designed with very specific constraints that differ ever so slightly from the constraints we would have to work with at the edge. One of the main ones that we have to deal with if we were to deploy caching logic to the edge is authorization. Because of course, if a GraphQL client runs in the browser, it knows that if something is in the cache, whoever's requesting this again, can access it, because it's the same person. If I'm using Spectrum, and I'm querying for the post with the ID 5, and the GraphQL client puts that in the cache, then the GraphQL client doesn't have to worry about authorization. It doesn't even have to know anything about authorization, because I am allowed to access the post with the ID 5. If I request the same post again, the client can just give that to me from the cache, and go, "Yes, of course. No problem."

At the edge, that's slightly differently. If we have one server sitting that a lot of users are requesting data from, some of those might be allowed to access the posts with the ID 5, but others maybe aren't. Or maybe if more specifically, if you think about user data, maybe somebody is allowed to access their own email, but nobody else's. We can't just take a query and put that result in the cache, because that would mean everyone gets served the same data. If somebody creates some data that's sensitive, that's specific to that user, suddenly, that will be served to everyone. That will be a nightmare. That'll be a terrible security nightmare, and a really bad experience, because we will essentially just be leaking data. Very bad idea.

At the edge, what we have to do is, rather than just making the cache key, a hash of the query, so essentially, we take the query text that we have in the variables, and we use that as a cache key. Rather than doing just that, we also have to take the authorization token into account, whether that's sent via the authorization header, or whether that is a cookie, we have to just add that to the cache key so that if somebody else sends the same query, they don't get the same response. It's as simple as that. Just put the authorization token in the cache key, and everything will be fine.

Global Cache Purging

The other part that's a little bit different, is cache purging. Because not only do we have to do automatic cache purging, and support manual invalidation for list invalidation, we also have to do it globally. If you're running at the edge in all of these data centers globally, then you have to invalidate that data globally. If the post with the ID 5 changes and the user sends a mutation to edit that, or the server says, "This has changed, and it was manually invalidated." Then you have to do it globally. You can't just do it in one data center. That will be a terrible experience, because the stale data would stick around in every other data center. You have to do it globally.

Fastly Compute@Edge

As we were thinking about these problems for GraphCDN, as we were building out this GraphQL edge cache solution, we came to the conclusion that we're going to use Fastly's Compute@Edge Product. We are huge fans of Fastly here, and the reason we chose Fastly is because like their name suggests, they are super-fast. Fastly has about 60, and ever-increasing data centers worldwide, spread across the entire globe. Here is a crazy fact, Fastly's invalidation logic. If you take a query response, and you put it into Fastly's cache, and you tag it with the post with the ID 5. If you then send an API request to Fastly to invalidate any cached query result that contains the post with the ID 5, they can invalidate stale data within 150 milliseconds, globally. That is probably faster than you can blink. In the time that it takes me to do this, Fastly has already invalidated the data globally. That is absolutely mind blowing to me.

I actually looked up a while ago, I was like, how fast even is the speed of light? Surely that takes a while to go around the globe once. I looked it up, and actually light does take 133 milliseconds to get across the entire globe. How can Fastly invalidate within 150 milliseconds? That is super-fast. The answer is, of course, that they don't have to go around the entire globe, because they're going bidirectional, they're going both ways at the same time. They only have to go around half the globe, which cuts the time in half. Then of course, they have a really fancy gossiping algorithm, which you can Google. They've written some great articles about it. I bow down in front of their engineers, because it's absolutely genius. It is so fast that it enables our customers now to cache a lot more data. If you can invalidate stale data within 150 milliseconds globally, imagine how much more data you can cache because it will never be stale. When the data changes, send an API request, and 150 milliseconds later, everybody globally has the fresh data. Imagine how much more data you can cache if you have this super-fast invalidation. That's the reason we use Fastly. They're super-fast, and we're super happy with them.

That's essentially what GraphCDN is. We rebuilt this caching logic to run at the edge to take authorization to account and to have this global cache purging. We deploy it to Fastly's Compute@Edge, 60 worldwide data centers to allow our customers to cache their GraphQL queries and their GraphQL responses at the edge. I wish this would have existed back in 2018 when we had our scaling problems with Spectrum. At the time, I just built a terrible in-memory caching solution that reduced the load slightly, until we eventually got acquired by GitHub. If we had had GraphCDN, we would have been able to scale so much more smoothly, we would have saved so much money, because of course running something at the edge is much cheaper than running the request through our entire infrastructure. It would have been a much better experience for all of our global user base because everybody would have had super-fast response times from their local data center.

Key Takeaway

The main thing I want you to take away is GraphQL is amazing for caching. That's really the takeaway I want to hone in on, GraphQL, absolutely fantastic for caching. The introspectability, the strict schema, just absolutely fantastic.

Questions and Answers

Fedorov: Can you share some of the success stories and the real-world numbers from the actual APIs in production that you currently manage and help to cache? What are the real-world numbers and if you can share any specific examples and what domains those numbers are from?

Stoiber: One of our recent customers is italic.com, which is an e-commerce retailer. They pride themselves on selling really high quality stuff that is completely unbranded, so there's no italic logo on anything. It's all completely unbranded, and they do it in the same factories and at the same manufacturers that other big brands are working on. You can buy a Prada bag without the Prada logo for much cheaper, is the point of Italic. They're really worried about Black Friday coming up. They had a huge traffic spike last year, and they really couldn't scale. Their servers apparently crashed every few hours. About a month or two ago, they started thinking ahead again, Black Friday is coming up, how can we solve this problem this year round? They added GraphCDN to their stack in front of their GraphQL API. It reduced their overall server load by 61%, database load by two orders of magnitude, and page load times by over 1 second. That's just one of the most recent ones that I know off the top of my head, who've been really successful with GraphCDN. We're actually seeing a lot more e-commerce companies signing up and putting us in front of their GraphQL APIs. One of them actually said, in a customer call, "Milliseconds mean money." For e-commerce, while they're not latency critical, they're very latency sensitive. The faster they can render their web pages, the faster their APIs are, the more money they will make. That's a very strong correlation there. Then our product really helps scale but also make more money, ultimately, because we can really reduce the page load times across the globe.

Fedorov: Actually, I think it was Amazon back in 2009, if I recall correctly, who did one of the first studies and mentioned some of the impressive numbers of matching the milliseconds delay on the e-commerce site to the revenue that they would generate for shopping. Definitely not a surprise to see more e-commerce retailers jumping on board.

Stoiber: There's a few studies like that, Walmart has done, famously some, Staples I think has done some, Nike has done some. Practically everywhere the outcome is, the faster your website is, the more money you make. I was recently talking with one of our other customers, and there's like a spike. If you get faster from 20 seconds to 10 seconds, it's not going to matter that much. You're still so slow that your conversion drop-off will just be massive. If you can get from 10 or even 5 seconds down to 3 to 1 second, that's a huge difference. We really enable our customers to get to speeds even faster than that globally, everywhere around the world, even though they usually only have their data centers in Virginia most of the time. A single data center in U.S.-East is the standard setup that we see there. That's really exciting for us, because ultimately, they enable us, and hopefully, they make a lot more money than they would ever have to pay us off of the back of performance improvements, which is quite amazing.

Fedorov: Let's dive a little bit more into the technical details. You mentioned that authentication is generally a challenge when you put the data from the client to the shared infrastructure, like a CDN edge. How do you generally define and what recommendations do you provide to the API authors, to categorize their APIs as authorized or generically shared? Can you generally share any more details into the internals of how it works?

Stoiber: What's interesting about GraphQL API is that certain types of fields might be authenticated, but others really aren't. Maybe a blog post type is publicly available if the data is the same for everyone, but then you might have a current user query that fetches the currently authenticated user. That obviously has to be specific to the authentication token that is present in the request. Essentially, we as GraphCDN, but I think every solution should do this, allows you to specify, the current user, for example, that's an authenticated query. If that's in the query, then please cache this entire result that you've just gotten from the origin, and cache it for every user separately so that we don't share that data. Then, corollary, if only a blog post is in the query, then please cache it publicly. Please make sure that if only the blog post is in there, just cache it the same for everyone, no matter if they're authenticated or not.

That configuration aspect of it of saying, ok, which fields and which types are specific to a user versus public, actually has interesting implications down the line, because with GraphQL, you can query for many of these fields at the same time. You can query for a blog post and the current user at the same time. Actually, a lot of what our customers end up doing right now is they manually end up sending two requests to increase their cache hit rate. Because otherwise, if you send both of these queries in the same HTTP request, then you'll end up with a very low cache hit rate on the blog post, and unnecessarily so, because you could have almost probably 100% cache hit rate on that. Because suddenly, every cached response has to be scoped to the user because it also contains the current user. At the moment, our customers manually end up splitting these usually into two HTTP requests and sending authenticated requests separately from public requests.

However, that's also something really interesting that we're thinking about because we at the edge, we can look at your configuration and we can go, hold on, the current user is specific to the user, and the blog post is public. Why don't we just split them automatically? We can just go in and say, we know that this part of the GraphQL tree is specific to the current user, this part is public. We're just going to split them at the edge into two HTTP requests, so you don't even have to worry about it. It's an under the hood optimization from us to make sure that you're getting the highest cache hit rate possible. That's something we're thinking about. We've so far been very conservative in messing with people's GraphQL queries, because that's where the danger lies. So far, we've tried to stay as dumb as possible. We just take the entire query, we cache the entire query. No magic, nothing can go wrong. It just works. Now with the feedback from our customers, we're starting to figure out, where are the areas and what are the features that we can provide now that we have this safe way of doing things? Where can we make it slightly less safe, but maybe get you a much higher cache hit rate in return? That is definitely one of the areas that we're very much thinking about.

Fedorov: You mentioned that you ask the users to specify, which is a public and which is a private API? How was it declared? Is it standard, or is it something that's specific to your product, or is it something that could become like the part of the GraphQL spec?

Stoiber: There's two different ways you can specify this, the default way is that we have the concept of rules, which is very similar to if you use Cloudflare before, they have page rules. Where, essentially, you can define rules and say, if the query contains this type of field, then set the cache configuration to that. You can create as many rules as you want. You can be like, if it contains the current user, it's authenticated. If it contains the blog post, it's public and cached for a year or whatever. You can set any configuration that way. That's specific to our product.

The other way you can do it is that we respect the origin cache control header, and that's a setting you can enable or disable as well. Particularly in a Node.js ecosystem, many GraphQL servers come with the ability to add annotations for cache control, two types and fields in your GraphQL schema. In your GraphQL schema, you can add things like the add cache control directive, and say add cache control, Max age 900. The GraphQL server then goes through the query, figures out what are all the directives that are in there in the schema for all the types and fields in that query, and computes the cache control header out of that, and sends that back through to response. We as GraphCDN, we can look at that header and we can go, you told us we should cache this for 900 seconds. We're going to cache it for 900 seconds. We also support that as an enhancement.

Depending on your GraphQL server implementation, that doesn't quite have the same flexibility as our rule structure, hence why we added the rule structure because we were like, some people are going to need more power than this. We also support essentially just a standard cache control header sent back from your origin depending on what the query contains or doesn't contain.

Fedorov: Another question about the internals. You mentioned how you invalidate the data and you talked about relationships between objects. How do you invalidate the relationships? Does it work similarly, or are there any nuances?

Stoiber: What we do is we essentially walk through the entire response. We take whatever you sent back from the origin that contains the blog post, the author, maybe all of the comments and their authors, and we figure out which objects are in this response. That might be a blog post, a user that is the author, a comment, and then the author of the comment might be another user. We tag the cached response with every single one of the objects that we see in the data, and then you can essentially ping us and say, "The user with the ID 5 has changed." We can go through and we can invalidate any cache query result that contains that specific object, that specific user. We don't even really have to be aware of relations. Those are still defined in your GraphQL schema, and with your GraphQL queries, but we can still invalidate them, because you have to send them back through the response.

Fedorov: What observability tools are needed for the edge caching? What were your findings and learnings that you integrated into your product?

Stoiber: We realized very early on that now people are passing all the GraphQL requests through us, we can provide analytics for people. We added GraphQL analytics to our system where you can essentially see, how many requests are you getting? What is your cache hit rates per timeframe? Which queries are you seeing? Which mutations are you seeing? What are their individual performance metrics, p50, p95, p99? What are the individual cache hit rates? Then we realized, we have all this data on your queries, but we also have a lot of data on your errors. Because we're in front of your infrastructure, we essentially see every single error that is sent back from your API, and so we added error tracking to our system. We have performance monitoring, and we have error tracking where essentially you can very finely trigger alerts very similar to what you might be used to from a Sentry, or a Datadog, or whatever. You can say, "If I see more than 50 GraphQL errors in a minute, if I have a huge spike, probably something's going wrong. I should probably be aware of that." We can then send you an email, it's like message on PagerDuty, triggered incident, whatever you want. The same thing is true for GraphQL and HTTP errors. That's the analytics we provide.

As we talk with more companies that are using GraphCDN in production, we realized very quickly that caching is pretty scary. Adding caching to your stack comes with a lot of inherent risk of maybe the data isn't fresh anymore. How do you know what is even in the cache? How do you know that all of this stuff is fresh? That's something we're very much thinking about. We do solve a big problem for a lot of people, and so people are already using us. There's a lot of risk associated with implementing caching, and thus, there's also a lot of hesitancy with implementing caching. We've been thinking about ways we could get rid of that. We have plans to work on more caching insights and give you a little bit more security with your caching.

Fedorov: What are the cases when CDN edge caching might not be a good idea? You mentioned that authentication is already one headwind, but especially thinking about invalidation and that 150 milliseconds delay, which on one side sounds fast, but on the other side for a use case like transactions, that might not be that fast. Are there any more details on what's your general thinking and guidance on this?

Stoiber: It definitely boils down to latency critical things where we're talking about milliseconds or something, you can't cache those at the CDN edge. That doesn't make any sense. Then it also only makes sense for very read-heavy APIs. I have good friends that work at Sentry, the error tracking service. Obviously, they get billions of errors sent to them, but only a small percentage of them are ever looked at. For them, caching doesn't make any sense because their data will change way more frequently than the cache hit would ever be. It would probably have a 0% cache hit rate. If you have a very write-heavy API, caching is probably not for you. On the other hand, if you have a very read-heavy API, that is latency sensitive but maybe not latency critical, I think that's really where GraphCDN comes in, or edge caching generally comes in. This can be a very powerful tool to help you scale, but also make your stack way more performant across the globe.

 

See more presentations with transcripts

 

Recorded at:

Oct 06, 2022

BT