Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Lambda Throttling - How to Avoid It?

Lambda Throttling - How to Avoid It?

Key Takeaways

  • AWS Lambda has a default concurrency limit of 1000 parallel executions that can lead to throttling.
  • There are suggested ways to work with synchronous and asynchronous invocations to avoid application and service throttling.
  • Async invocations are recommended for any event that doesn’t require an immediate response.
  • Where synchronous invocation is required, you should use an API Gateway, as AWS doesn’t implement a retry mechanism for you.
  • Benchmark results demonstrate that even if the application is throttled with asynchronous invocation, all events will eventually be handled by the lambda function, providing greater resilience.

Lambda is a managed resource. This means that, on the upside, it was designed for minimal management and maintenance. However, on the downside, this doesn’t mean it is free from limitations. Therefore, when we choose to use serverless and lambda, we need to think about resource management issues as early as design to avoid some of its common pitfalls.

One of these issues is its inherent concurrency limitations. It simply is not possible to run an infinite number of functions simultaneously. This makes sense, of course. As a service that needs to be highly available for millions of other users like us, there needs to be some predictability. While “serverless” for us, they (AWS in this case and for the sake of this article) do need to manage the servers and resources we opt not to run. And this would be an impossible feat with an unpredictable or infinite number of resources running in parallel. 

The benefit of this, though, is that this also provides a built-in guardrail for cost. It protects users from astronomical costs when mistakes are made––preventing the invocation of millions of lambdas accidentally. Therefore, by setting limits, AWS protects its users and their resources.

The default concurrency limit is 1000 parallel executions across the AWS account. Once you reach that limit, you will receive a `Throttling` status response code from the lambda service. [Note that you can request AWS to increase this quota when your product scales or your needs evolve, as the rate limit reflects the sum of all of your invocations. AWS will often verify that the increase is justified, as this comes with cost implications.]

An important thing to be aware of when understanding invocation rate limits is that there are systems that serverless is not the most economical option for, and other design patterns should be evaluated. 

What happens if you reach your concurrent execution limit?

The goal of this article is to explain best practices if you have throttled your application and services and suggestions for how to handle these cases. We performed an in-house experiment at Jit (a SaaS-based DevSecOps platform) built on serverless to learn how our application behaves.

Ways to invoke lambdas

Before we present the experiment results and code samples, let’s first describe the ways to invoke lambdas because this directly impacts how we handle this issue.

At the very least, we need to define sync vs. async invocations. For synced invocations, we expect to receive a response, and by design, our client waits for the response to advance the code flow. With async invocations, we essentially send an event and forget about it. With async calls, we have more freedom with managing the execution of the event (e.g., we can even do retries with exponential backoff if we reach the concurrency limit. Don’t worry though, AWS does that for you).

Let’s also describe the lambda service mechanism a bit.

Lambda asynchronous invocation mechanism

Lambda is a service that runs your code functions. For this article, when saying lambda in short form, we are not referring to the functions but the service that runs the functions.

Lambda, the service, has an internal queue that serves as a buffer between the invocation and your code. For instance, if you send a message through an event bridge or simple queue service (SQS), it doesn’t invoke your function directly; it has an internal state management mechanism.

How does this apply to the topic at hand?

When your lambda is throttled and you reach the maximum parallel execution limit, lambda returns a throttling error. Lambda has a retry mechanism with exponential backoff that starts from 1 second and reaches a max of 5-minute windows which can even run for 6 hours (by default), to try to complete the execution of a failed event.

We should also mention that for better error-proofing your code, you could use  a dead-letter queue (DLQ) which other queues can target for messages that can’t be processed/consumed successfully. A DLQ is for the cases it still fails to execute, but that is just for reference, and we will not dive into that now.

The meaning of this is tremendous. It doesn’t matter if we send a message with SQS, Eventbridge, or other async services; you will practically never need to think about handling throttling issues.

I’ll unpack this. It means that even if you have a limit of 1000 concurrent lambdas and you send 2000 events through Eventbridge configured for asynchronous invocation, you will reach the throttling limit. However, in contrast to synchronous invocation, this will not impact your application and service level agreement (SLA), as the events will be kept in the internal Lambda service queue and handled in time when the resources have freed up to manage them. Every single one of them.

Let’s see this in an actual code example—an async invocation example using Eventbridge

To simplify the example, we will use a Lambda attribute called `reservedConcurrency`. This attribute will limit the lambda function to a small concurrency number allowing us to run the test without invoking hundreds of instances.

In this example, we will use the serverless framework to set everything up:

handler: async_target.handler
    description: test async throttling with limit on reserved concurrency
  - eventBridge:
      eventBus: test-bus
          - 'source'
          - 'async-reserved'

This code snipped will invoke the lambda to intentionally reach the concurrency limit and trigger the throttling error:

for _ in range(20):
                    detail={"payload": "data"},

The AWS console will provide you visibility into your serverless operations, and we can now check the execution outcome of our target lambda function.

We invoked the function 20 times, as we can see in this graph:

And we had a maximum concurrency of 5 parallel functions as shown in this graph:

The throttling retries can be seen in this graph:

What did we learn?

We can see that with the asynchronous invocation, all of the events were eventually handled by our lambda function, including all of the throttling and retries that AWS Lambda handled. This means we can send the events using Eventbridge, SQS, SNS, or any other asynchronous mechanism and achieve the same outcome.

The critical thing to note is that this works the same way even if we reach the 1000 account limit and not solely a single lambda function limit. However, it is also important to note that, like everything else, this has limitations. Ultimately, the retries have a maximum retry window of 6 hours. Suppose your scenario requires more resources, a larger retry window, or fault tolerance. In that case, you may need to create a more robust implementation––such as a dead-letter queue (DLQ) for events that don’t complete after the 6-hour window is up or further notifications to prevent such failures. 

Synced invocation example using API Gateway

There are scenarios where synchronous invocation is required. In these cases, we wait for a lambda response to continue running the rest of the service flow. For these situations, a good practice is to use an API Gateway. In this case, AWS doesn’t implement a retry mechanism for you. This makes sense, as we likely can’t wait 6 hours for a response as a client or another lambda function is awaiting a response.

When the lambda is throttled, it returns a “429” response code as before, but this time it goes through the API Gateway. Surprisingly, the API Gateway doesn’t reflect the 429 status code and instead returns a 500 status code which means “Internal Server Error.” In a way, this also makes sense (albeit not intuitive), as the clients themselves don’t know or need to know that there are internal limitations due to the mismanagement of resources. The bottom line is if you are getting throttling errors behind your API Gateway, you are doing something wrong. You may need to revisit your architecture to avoid getting to such a state.

If your use case really does require such a large concurrent limit, you could request that AWS increase your limits. It’s worth noting again that if you are hitting maximum capacity all the time—meaning your lambdas are up and running all the time—this may no longer be a more economical architecture than just having servers running all the time. With large scale, more concurrent invocations are required, and since serverless is priced by invocation, its cost benefits are nominal when constantly at capacity limits. This behavior is a known issue in AWS, and they are in the process of resolving the way lambda error codes are reflected through the API Gateway (and we should follow their release notes to see when this is resolved). 

Ok, now, onto the fun part. Let’s see the experiment!

Let’s set up the experiment

First, let’s limit the concurrency of our function using the same `reservedConcurrency` attribute:

handler: synced_target.handler
  description: test synced throttling through API GW with limit on reserved concurrency
    - http:
        method: get
        path: /invoke-sync-concurrency-reserved

Next, we will invoke the function with this code:

from multiprocessing import Pool
import requests
def f(x):
    url = "<>"

    response = requests.get(url)
    return response.status_code, response.json()

if __name__ == '__main__':
    with Pool(10) as p:
        print(, range(4)))

And the result is:

[(200, 'OK'),
(500, {'message': 'Internal server error'}),
(200, 'OK'),
(200, 'OK')]

So we can see three invocations succeeded, and one of them fails with the error code 500.

If you explore the API Gateway logs, you will see the following:

You can see in the logs that the lambda itself returns a 429 error, but then the API Gateway returns a 500 Error.

What have we learned?

When using synced communication, we need to be much smarter about our design to make sure we can avoid issues like the throttling of our services. While you can implement a retry on the client side when you receive a `500 error`, it is extremely difficult to understand if the issue is throttling––particularly if you are an API user–– when it’s something else entirely. 

That is why, as a best practice, it is always good to set your systems to do retries upon failed invocations. As a service developer, it is also important to turn on your API Gateway logs so you can be aware of these issues and track down the underlying cause of generic error codes. In addition, when you see issues like this, it’s probably a good indication that you need a better service design. When this isn’t immediately possible, at the very least, see how you increase your concurrency limits to avoid hitting a wall.

Final Thoughts

It’s always important to understand how our chosen technologies perform under the hood and what impacts their behavior. An important takeaway to avoid throttling when building on serverless architecture is to go with async invocation whenever you can. Reserve synchronous invocation for the use cases that require this, and make sure to build robustness and fault tolerance into the system through retries and proper visibility through logging.

Serverless is extremely flexible, with the added benefit of minimal operational overhead. However, it is important to know how to work around its limitations not to find yourself with service breakage. Plan and design your applications for resilience by playing to lambda’s strengths when it comes to scale and concurrency.

About the Author

Rate this Article