Web APIs and the n+1 Problem
You are probably familiar with the n+1 problem: when a request to load one item turns into n+1 requests since the item has n associated items. The term has been mainly described in the context of Object Relational Mappers (ORMs) when lazy loading child records have caused additional calls to the Relational Databases (RDBMS) resulting in poor performance, locks, and timeouts.
However, this problem is not confined to ORMs. If we stick to the definition above, we see it can apply elsewhere, and in fact we see a rising trend in its occurrence. Nowadays, our data are usually served by Web APIs, 1) by composing data from multiple APIs, 2) by using a distributed cache, or 3) by reaching to a NoSQL data store. We will demonstrate how n+1 problems can occur in each of the scenarios above.
An n+1 problem on the server has many manifestations. Slow responses, thread starvation, and inconsistent performance are some of the symptoms. In this article we will look into the background to the problem and propose several patterns and solutions.
n+1 and NoSQL
Every day we move further away from RDBMS. NoSQL movement is well established now with many companies already using or busy building applications that sit on top of one or more NoSQL stores. Whether it is MongoDB, Cassandra, RavenDB; soft stores such as Redis, or even the cloud variants such as Azure Table Storage or Amazon's Dynamo, it is all moving very fast.
By moving away from RDBMS we have successfully freed ourselves from the restricting schema-bound limitations of RDBMS and been able to churn code very quickly. This has allowed for higher agility of teams. While the flexibility of the schema is useful and some future changes can be absorbed by the application, decisions about whether include child items with the parent is an important one which best taken early on.
In traditional RDBMS-based applications, one would join parent and child tables - if both needed to be returned - and this join is usually extremely efficient and quick. In the NoSQL world, we can store the parent along with its children. While this is fine for many scenarios, it is not always possible. When the relationship is one of aggregation, an entity has an identity independent of other entities that it has association with. For example, a school, classroom, subject, teacher, and student each have an identity of their own and it is not possible to define a single aggregate root. One mistake is to turn aggregation into composition. While this sometimes works, it can lead to very coarsely granular entities that are very big to move around; for example defining a customer with all its orders as the aggregate root. On the other hand, such entities can become a point of contention within the system when multiple processes need to change the aggregate root at the same time.
So some NoSQL document databases provide the means to store the reference of one document in another - for example MongoDB provides Concept of ObjectId which is a reference to another document. This is exactly what leads to an n+1 problem in NoSQL databases: your one query turns into n+1 query to return all related documents.
n+1 and Batch API
Many APIs provide a batch API so a client can achieve multiple steps in a single call. There is a variant where multiple entity IDs are passed to the API to return all those entities.
While a batch API protects the client from having to deal with n+1 problems, it usually leads to an n+1 on the server unless the problem can be absorbed by a backend RDBMS query or a native batching support in the backend NoSQL store.
n+1 and API Composition
By adopting SOA each type of data is segregated into its own bounded context. This is even truer when working with micro-services. So in order to provide a data useful to the clients, APIs need to aggregate data from multiple services and compose them before providing the data to the clients. This is predominantly seen on the presentation layer or a composition API (where a service's sole responsibility is to gather data from several APIs and expose as a consolidated API) where calls are from heterogeneous sources. This by definition is an n+1 problem.
n+1 and Distributed caching
This is not a very typical case, however it happens more often than you may think. If you have a couple of items to retrieve from the cache it is not a big problem, however, if you are reading hundreds of items, it starts to take its toll. The more you design the site for better performance, the more you push data to the cache, increasing your dependency. Using Redis or similar soft stores gives you a quick response but with hundreds of items, you are looking at a few hundred milliseconds extra just for the calls. Parallelising the calls can be used to negate the effect and some client libraries provide pipelining capabilities. This can increase the risk of thread starvation depending on your system. If you need to make many calls to an external cache in order to serve one request, you are basically hitting an n+1 problem, which sooner or later will manifest itself in your performance metrics.
New patterns needed
n+1 is not always avoidable - building linearly scalable systems can lead to n+1 cropping here and there and we need be prepared. Making multiple requests is usually inevitable and warrants new patterns to make calls to the servers resilient and optimally performant.
In this post we will look at some of these patterns. Before doing so, we present a model scenario based on a familiar eCommerce use cases. For the purpose of this post, let's assume we have a Web API that needs to compose its data by calling other APIs. And let's assume for every request, we have to make 10 requests.
In our example scenario we are consuming services that in most cases behave well but occasionally calls will take longer. In 99% of cases response time is 10-100ms but 1% will be 5-25sec. We assume distribution of likelihood among each of these two ranges are even (e.g. likelihood of 10ms is equal to 50ms or 100ms).
If we calculate the average time it takes to make a call, we get to the value of ~ 200ms. If we sequentially call these services, on average we are looking at 2 seconds to complete all calls which is not acceptable.
Now let's look at calling these services in parallel. The call would take as long as the slowest call. We know that calls may take 5-25 seconds in 1% of cases, however, when 10 calls are involved this likelihood increases to 9%! (1.0 - 99%^10) We need to do something about this.
Denormalize and build read models
While data modelling is not a Web API problem, when it involves an n+1 case it becomes one - hence we have to look closely at this case. If you have a parent-child or multiple relationship and in order to load an item you have to make more than one query, you have an n+1 problem.
A common pitfall is to approach the data model design using the traditional RDBMS mind set and treat every conceptual model as an entity. In many cases, the presentation layer only requires value objects to populate the screens, the right data model changes with the screen itself.
First of all, many items can simply become value objects or composed entities within the aggregate root. For example, if a blog comment is only a user name and text and can be only accessed through the blog post, define it as a value object and store it with the blog post. If we need to supply a link so that comment can be shared and by clicking the link, blog post is shown and the page is scrolled down to the comment position, we define it as an entity but embed it along with the blog post.
What if we need a signed-in user to be able to see all the comments he has left? One option is to have an embedded comment entity as above in the Blog Post domain and to have another embedded comment entity with the user and store the first few characters and its link inside the User domain.
There are drawbacks with bloating an aggregate root with embedded child objects. First of all size can become a constraining factor (e.g. a popular blog post with thousands of long comments) or if the item has a fluid state and gets updated many times (risk of concurrency). These drawbacks need to be weighed against the benefits.
This is in essence, the most important solution to the n+1 problem - although it requires other steps to be in place to work properly. Instead of making n calls sequentially, we parallelise the calls and wait for the successful completion of all before returning the data. This approach will make your code notably more complex and fragile unless you abstract out the parallelisation. In .NET framework you can use Parallel.For to make n parallel calls. However, Parallel.For is not designed for asynchronous operations and you have to resort to Task.WhenAll and pass n async operations to run in parallel. Here is an example in C#:
var client = new HttpClient();
var responses = await Task.WhenAll(
}.Select(url => client.GetAsync(url)));
In the simplistic form above, if any of the code paths throw exceptions, the whole set will fail. So the error handling and retry needs to be built into the call (see below).
Using Async patterns
It is often said that in terms of security, your site/service/application is as vulnerable as its weakest point. This rule also applies to the performance of your site: your API/site is at least as slow as the slowest path of your system.
It cannot be counted the number of times that web sites/APIs grind to a halt with a transient slow-response glitch in one of the 3rd-party or in-house services. What generally happens is the first few requests are handled fine but then you soon run out of threads in the thread pool and start queuing on the IIS. Responses will take longer and longer until your IIS queue gets full (default value is 5000) after which you get 503 responses.
Async calls (now supported by many languages, tools, and platforms) take advantage of IO Completion Ports (IOCP) and free the worker thread to go back to the pool and serve other requests. After the completion of an IO operation, IOCP will notify back the method to resume its operation where it left it. Any IO bound operation, including network access, (such as reading/writing to files, calling other APIs and services, or accessing database or out of process caches) can potentially use IOCP. On the .NET framework, method pairs prefixed with Begin- and End- have historically supported IOCP (for exampleBeginWrite and EndWrite on the Stream). Most of these have a new Async counterpart which is a single method with an -Async postfix (for example WriteAsync). Nowadays with async/await syntax in C#, writing async code is very easy. To take full advantage of async programming, you need to go async all the way so your server code is chained by async calls from the entry point until its exit, otherwise the thread is not freed and you might end up thread-deadlocking yourself.
Optimising threading model and network throttles
Using asynchronous patterns and making calls in parallel puts a strain on the threading backbone which is usually not designed for such a heavy usage of threads. While asynchronous calls release your threads back to the thread pool, you still need enough threads to initiate the process. Also in some cases you might have to use non-async libraries in parts of your code. In this case, the effect is magnified if you call asynchronous calls within a synchronous block since you have to resort to offloading your call to another thread to avoid thread deadlocks:
Task.Run(() => item.MethodAsync()).Wait();
.NET ThreadPool is designed to be self-regulating. ASP.NET by default uses a feature called autoConfig which is set in processModel element of the machine.config. It is on and it takes care of network throttling and minimum and maximum worker threads and IOCPs by default. Based on experience, I have found these not to be sufficient for the heavy use of threading proposed in this article. So I would normally turn off the autoConfig in the machine.config located at %windir%\Microsoft.NET\Framework64\v4.0.30319\Config(for a 64-bit machine):
With this change you need to make sure to add these codes at the application startup:
ThreadPool.SetMinThreads(250, 250); // a reasonable value but use your own
ServicePointManager.DefaultConnectionLimit = Int16.MaxValue;
First value in the SetMinThreads call is the minimum number of worker threads and second parameter is the minimum number of IOCPs. Having said that, ThreadPool has a habit of regulating itself so it is normal to see this number decrease after a period of inactivity. In fact, setting the values above is not the main reason to turn autoConfig on. The most important reason is to allow for a sharp increase in the number of threads in case of burst activity. With autoConfig on, only a couple of threads are added every second and this is not quick enough to handle burst activities leading to Thread Starvation.
While the timeout and retry pattern is not a novelty by any means, it is essential for building systems that can deal with transient failures and environment instability causing calls to fail or take too long, while retrying succeeds very quickly. It is important to build retry for your command but even more important to implement timeout and retry for your queries. Netflix has successfully used aggressively low timeouts in their APIs. One of the reasons behind this aggressively low timeout is the principle of Fail Fast and next time you are very likely to get a performing server, giving you the response very quickly. An interesting choice is to include a longer timeout for the last retry.
So let us see how this could improve our performance compared to the performance we had above. As we calculated, with no timeout/retry the likelihood of the call taking 5-25 seconds was 9%. Let us use 2 retries (total 3 calls), and use an initial timeout of 100ms coupled with a final timeout of 30 seconds. Using this approach, the chance of getting a timeout drops to 0.1% [1 - (0.99 * 0.99) ^ 10] so we have reduced the likelihood by 900 times! Similar to parallelisation, the retry-timeout code needs to be streamlined into a coding pattern so the callers do not have to implement it everywhere.
While the timeout of 100ms is selected conveniently based on our arbitrary scenario setup, in real world you would choose a percentile of the response time. Here we have chosen 99th percentile. Choosing a value depends on your own service's defined SLA and response time distribution.
A short timeout and retry cycle is useful when the long responses and errors are due to glitches and transient failures. When a service is generally having a bad time, overloading it even with more requests causes further disruptions. A common pattern is to implement a circuit breaker which will stop sending any requests to that particular service and start using an alternative means of fulfilling the request. It is obvious a circuit breaker can be useful when we have an alternative means of fulfilling the request. Netflix successfully uses a cache local to the service when the circuit breaker gets activated. Sometimes it is acceptable to send back data which has some missing parts.
While timeout-retry is a request level implementation, circuit breaker is a global switch and once activated for one request, it will apply to all requests. Conditions based on which circuit breaker is activated vary but include receiving a particular error code or timing out on subsequent requests. Circuit breakers can be reverted manually by an admin but most commonly they have a timeout (for example 15 minutes) and are automatically lifted: If the service is still down, they will be activated again.
As we explained, n+1 with accessing external caches can cause an n+1 effect. While the response time for accessing cache is usually trivial, they could still impose a strain on the system's threading. Since most servers contain an abundant amount of memory, it is possible to use the memory as the first level hit of the cache. In other words, build a layered cache where a local memory cache with a shorter expiry sits in front of a distributed cache with standard expiry. Cache miss ratio is proportional to the number of servers you have in same zone. If the number is large, the cache miss ratio will be high and cache is ineffective but with if you are have less than 10 servers (and certainly less than 5) this approach can lead to a measurable improvement in service quality.
This pattern can be implemented invisibly to the consumers of the cache so the distributed cache client is in fact an in-memory cache wrapping a distributed cache.
n+1 is not merely a data problem: it can impact the resilience of Web APIs serving the data. API composition produces a similar effect and is akin to similar adverse effects on the resilience and health of the API. While n+1 can sometimes be averted in the source by more denormalized data modelling, Web APIs need to be equipped with the tools and patterns to combat the negative impact and successfully maintain their Quality of Service.
Parallelising calls to APIs and cache or data stores is very important however, this could put a strain on the server's thread pool. Using async calls and optimising the threading knobs and network throttles is mandatory. On the other hand, using short timeout plus retry helps with consuming an API with varying response time and transient glitches. Use circuit breakers to handle more permanent failures, but you will need to design and build an alternative to the failed service.
About the Author
Ali Kheyrollahi is a Solutions Architect, author, blogger, open source author and contributor currently working for a large eCommerce in London. He is passionate about HTTP, Web APIs, REST, DDD and conceptual modelling while staying pragmatic in solving real business problems. He has +12 years experience in the industry working for a number of blue chip companies. He has a keen interest in computer vision and machine learning and has published a few papers in the field. In his previous life, he was a medical doctor and worked for 5 years as a GP. He blogs here and is an avid twitter using handle @aliostad.
Craig Motlin Sep 01, 2014