Key Takeaways
- Cold starts in AWS Lambda are often misunderstood, leading to misconceptions about their frequency, duration, and impact.
- Various factors influence cold start duration, including choice of runtime, configuration settings, and Virtual Private Cloud (VPC) involvement.
- AWS provides strategies like SnapStart, Provisioned Concurrency, and VPC networking improvements to manage and mitigate cold starts.
- A case study on building a low-latency control plane API backend illustrates the effective management of Lambda cold starts.
- Understanding and effectively managing cold starts is crucial for optimizing serverless applications on AWS Lambda.
Overview
This insightful InfoQ article dispels the common myths surrounding Lambda Cold Starts, a widely discussed topic in the serverless computing community. As serverless architectures continue to gain popularity, misconceptions about Lambda Cold Starts have proliferated, often leading to confusion and misguided optimization strategies.
We delve deep into what a Lambda Cold Start is, under what circumstances it happens, and its impact on application performance.
The article will also address strategies to mitigate Lambda Cold Starts and explain why it’s not always the menace it’s portrayed to be. By elucidating these myths, we aim to provide a clearer understanding of Lambda Cold Starts, empowering developers to leverage serverless computing more effectively. At the end of the article, we will share a use case of building a low latency control plane application using AWS Lambda.
Understanding Cold Starts
Cold start is a term for the phenomenon of applications that haven’t been used taking longer to start up. For AWS Lambda, this happens when a new function instance needs to be bootstrapped to respond to an invocation. Factors influencing cold start duration include the choice of runtime, configuration settings, and whether the Lambda function is part of a VPC.
Regarding cold starts, it’s essential to debunk some prevalent myths surrounding this phenomenon. Cold starts are often cited as a significant drawback of serverless computing, but understanding their impact requires a nuanced perspective.
It’s essential to recognize that while cold starts can introduce latency, their frequency and effect can be significantly mitigated with various platforms’ proper architecture and solutions. It’s not just AWS Lambda that experiences cold starts; Azure Functions and Google Cloud Functions are also subject to similar initialization delays when a new instance is triggered.
A generic cold start (screenshot from the video) | Cold Start inside a VPC (screenshot from the video) |
Debunking Five Myths
Cold starts in AWS Lambda often create confusion and misunderstanding, leading to the propagation of certain myths. This article aims to debunk these myths, providing a clearer perspective on Lambda cold starts.
Myth 1: Cold starts happen frequently
A common misconception is that cold starts occur for every Lambda invocation. However, this isn’t the case. AWS Lambda reuses function instances whenever possible, resulting in cold starts only when a new instance is required. This might occur for the first request after a function is deployed, after a period of inactivity, or when scaling to handle increased load. Therefore, while cold starts do occur, they’re not as frequent as often believed.
Myth 2: Cold starts always take a lot of time
Another myth is that cold starts always result in significant latency, and adding extra memory/cpu will overcome the impact. The duration of a cold start varies depending on several factors, including the runtime, function configuration, and whether the function is part of a VPC. While some cold starts may take a few seconds, others can be much quicker. Therefore, it’s incorrect to generalize that all cold starts result in lengthy delays.
Myth 3: Cold starts significantly impact all types of applications
The impact of a cold start is often overestimated. While it’s true that latency-sensitive applications might be affected by cold starts, many types of applications, such as background tasks or asynchronous processing jobs, are largely unaffected. Therefore, the impact of cold starts isn’t universally significant and largely depends on the nature of the application.
Myth 4: Regularly invoking Lambda functions with "no-op&" keeps them warm
It’s a common misconception that persistently triggering a Lambda function with a no-operation (no-op) instruction maintains a pool of warm instances, ready to handle subsequent requests without the latency of a cold start. This strategy, however, is flawed due to unpredictable warm state management, the potential for wasted resources, the risk of throttling, and the existence of more efficient alternatives, such as AWS’s Provisioned Concurrency feature. Therefore, this practice is neither efficient nor recommended for managing Lambda function readiness.
Myth 5: A single cold start of duration "T ms" is worse than N times of duration "T/N th ms"
One prevalent myth about Lambda cold starts is that enduring a single cold start of duration "T ms" is worse than experiencing N cold starts each of duration "T/N th ms." This belief stems from the idea that distributing the latency of a cold start across multiple invocations lessens its impact compared to a single, longer-duration cold start. However, this is a misconception.
Let’s consider an example where we have two scenarios:
Scenario A: Experiences a cold start penalty of 1 second each for 20% of invocations.
Scenario B: Experiences a cold start penalty of 500 milliseconds each for 40% of invocations.
Q: Scenario A p100 is 1100ms, while Scenario B p100 is 600ms; which one is better?
When considering the p100 (100th percentile), Scenario B is better as it has a lower response time of 600 ms than Scenario A’s 1100 ms. However, the evaluation may change when we look at the lower percentile and trimmed mean. So, when considering the 70th/80th percentile and the trimmed mean at the 70th/80th percentile, Scenario A is better because it has lower values for both measures.
Scenario A | Scenario B |
Learnings with Lambda Latency Measurement
- Monitoring your latencies over histogram/cdf can help to visualize outliers of your Lambda function latency.
- Tail latencies (outliers) indicate the impact of your lambda function’s cold start.
- Outliers skew averages, so the average doesn’t represent typical behavior in a system with outliers.
- Percentile(p) and TrimmedMean(tm) are good ways to define your service API latency goal.
- p99 represents the worst (highest latency) experience for 99% of our customers’ requests. It does not tell us what percentage of those 99% requests observed the worst experience (highest latency).
- tm99 represents the mean of the 99% of our customers’ requests that have been trimmed (by removing 1% outliers).
- A higher TrimmedMean(tm) indicates a higher frequency of cold starts, while a higher percentile(p) mainly indicates a higher duration of cold starts.
- A higher TrimmedMean(tm) is more concerning than a higher percentile(p) as it means a significant % of customers are seeing an elevated latency.
Observability around Lambda Cold Starts
Observability around Lambda cold starts is crucial because it allows developers to gauge the impact of initialization delays on the overall performance of their serverless applications. Developers can identify patterns and potential application bottlenecks by monitoring metrics such as the duration and frequency of cold starts. This level of insight is essential for implementing effective optimization strategies, such as adjusting function memory settings, tweaking timeout configurations, or optimizing the codebase to reduce initialization time.
1) AWS X-Ray trace for a Lambda: During a Lambda cold start, an "initialization" subsegment can be observed within an AWS X-Ray. This segment captures the function’s initialization activities, such as dependency resolution and the setting of global variables, which occur before the function handler is executed. This step is critical as it happens only once, thus preventing the need for repetition with each function invocation. However, the presence of numerous dependencies can prolong this initialization phase.
2) AWS Cloudwatch Logs for a Lambda: AWS Lambda includes initialization duration in the REPORT message at the end of each invocation. However, this detail isn’t automatically available as a CloudWatch metric. Lambda also automatically captures and sends logs about each phase of the Lambda execution environment lifecycle to CloudWatch Logs. This includes the Init phase, in which Lambda initializes the Lambda runtime and static code outside the function handler; the Restore phase, in which Lambda restores the execution environment from a snapshot for Lambda Snapstart-enabled functions; and the Invoke phase, in which Lambda executes the code in your function handler.
3) AWS CW LogStream for a Lambda Instance provides a powerful and detailed logging mechanism for AWS Lambda functions, offering insights essential for thorough debugging and performance analysis. By creating a dedicated log stream for each instance of a Lambda function, developers can address specific debugging questions more effectively:
- Instance Identification for Requests: By examining the log entries, developers can determine if requests "A" and "B" were processed by the same Lambda instance or by different instances. This is crucial for understanding load distribution and session consistency.
- Load Distribution Analysis: Log streams enable monitoring of how requests are distributed across the different instances of a Lambda function. This helps verify if the workload is balanced fairly among all available instances.
- Active Instance Counting: Through log analysis, developers can estimate how many instances of a Lambda function are currently active, which assists in scaling decisions and resource optimization.
- Cold Start Analysis: Logs provide data on the cold start duration for each instance. This enables a comparison to determine if the cold start duration is consistent across all instances or if variations need to be addressed.
- Instance Launch Frequency: Logs can show how frequently new instances are launched, which is crucial for understanding the behavior of the Lambda function under varying loads and for capacity planning.
- Identifying Malfunctioning Instances: If a particular instance consistently shows errors or longer execution times, it can be identified as a "bad" instance through the logs. This is critical for maintaining the overall health of the application.
- Version and Alias Tracking: Each log entry is tagged with the Lambda function’s version or alias, making it easier to track which version or alias each instance belongs to. This is particularly useful when managing multiple versions and ensuring the correct version operates.
Strategies for Reducing the Frequency and Duration of Lambda Cold Starts
AWS provides several strategies to manage and mitigate cold starts.
1) SnapStart
Lambda snapstart for Java can improve startup performance for latency-sensitive applications by up to 10x at no extra cost.
Snapstart Overview (screenshot from the video)
2) Lambda Provisioned Concurrency
Provisioned Concurrency keeps functions "warm" and ready to respond to invocations instantly. It also incurs additional charges. Use provisioned concurrency if the application has strict cold start latency requirements. We can’t use both SnapStart and provisioned concurrency on the same function version.
When considering the use of Provisioned Concurrency in AWS Lambda, several critical details need to be kept in mind to ensure its effective deployment and management:
- Provisioning Time: Once Provisioned Concurrency is enabled for a Lambda function, AWS needs time to provision the specified number of concurrent executions. This setup process can take a couple of minutes. During this time, you can monitor the provisioning status in the AWS Management Console to see when the functions become available.
- Tied to Specific Versions: Provisioned Concurrency must be configured against a specific version of a Lambda function. It cannot be applied directly to the $LATEST version, which is the mutable version representing the latest uploaded code. This ensures that the behavior of provisioned instances is predictable and stable.
- Configuration on Aliases: If you configure Provisioned Concurrency on an alias, it automatically applies to the version that the alias points to. For example, if you have an alias named `canary` that points to version 10 of your function, setting up Provisioned Concurrency on the `canary` alias means that version 10 will have provisioned instances ready.
- Restrictions on $LATEST and Alias Overlap: Provisioned Concurrency cannot be set on the `$LATEST` alias or any alias pointing to `$LATEST.` Additionally, you cannot set up Provisioned Concurrency on multiple aliases that point to the same version. This restriction is designed to avoid configuration conflicts and duplication of provisioned capacity.
- Non-Combinability: You cannot combine Provisioned Concurrency configurations on an alias and its underlying version, nor can you configure it on two aliases pointing to the same version. This prevents overlapping configurations that could lead to unexpected behaviors or excessive provisioning.
Selective Bootstrapping During the Initialization Phase
One of the effective strategies to mitigate cold starts in AWS Lambda is to employ a technique called Selective Bootstrapping. During the initialization phase of a Lambda function, two main tasks are performed:
- Bootstrapping the runtime: This involves setting up the execution environment for the function, which includes loading the runtime, the function code, and any dependencies.
- Running the function’s static code: This refers to executing any code outside the function handler, which runs only once when the Lambda function is initialized.
This initialization process happens with additional CPU cores, a "CPU boosting" feature that temporarily increases the CPU power available to speed up the initialization. Selective bootstrapping aims to optimize this initialization process based on the type of Lambda invocation—"provisioned concurrency" or "on-demand."
- For provisioned concurrency, where Lambda instances are kept warm and ready to respond instantly, a "full" bootstrapping can be performed. This means running all initialization tasks upfront to ensure the function is ready to respond to invocations with minimal latency.
- For on-demand invocations, where new Lambda instances are initialized as required, a "minimal" bootstrapping can be performed. This means running only the essential initialization tasks needed to start the function, thereby reducing the cold start time.
To achieve this selective bootstrapping, use a Lambda environment variable, AWS_LAMBDA_INITIALIZATION_TYPE, to determine the type of bootstrapping to perform during the initialization tasks in the static portion of function code.
Case Study: Building a Low Latency Control Plane API Backend with AWS Lambda
In this case study, we’re exploring the development of a low-latency control plane API backend using AWS Lambda. The primary aim is to ensure quick response times, making effective management of cold starts crucial.
Context and Challenge
The control plane API backend is the brain for managing and orchestrating resources in a cloud-based application. It requires low latency to respond swiftly to API calls, ensuring smooth operation and user experience. However, serverless architectures like AWS Lambda often face the challenge of cold starts, which can introduce latency.
Strategy and Implementation
The first step was to optimize the function configuration to ensure low latency. The runtime was selected based on performance, and the memory settings were tuned to balance cost and performance. The deployment package size was minimized to reduce the function code’s download time, thus reducing cold start time.
Next, the Provisioned Concurrency feature of AWS Lambda was used. This feature keeps a specified number of function instances "warm" and ready to respond to invocations instantly. By setting an appropriate level of provisioned concurrency, we ensured that enough warm instances were always available to handle incoming API calls without incurring a cold start. A spillover lambda invocation (on-demand) will still take a cold path, so monitoring % of provisioned concurrency usage metrics is essential.
Switch to AWS Java SDK v2, as it includes three primary changes that improve initialization time (see AWS SDK Documentation).
Eager vs. Lazy Initialization, depending on whether the container is "on-demand" or "provisioned-concurrency."
Results
By implementing these strategies, the control plane API backend could manage cold starts effectively. The latency of the API backend was significantly reduced, with most API calls being handled by warm Lambda instances. Even when cold starts did occur, their duration was minimized through careful function configuration.
Conclusion
This case study demonstrates that, with proper configuration and management strategies, AWS Lambda can effectively handle the low latency requirements of a control plane API backend. The key is understanding the behavior and characteristics of Lambda cold starts and using the features and strategies available to mitigate their impact.