InfoQ Homepage Presentations Practical Performance Tuning for Serverless Java on AWS

Practical Performance Tuning for Serverless Java on AWS

View Presentation

Speed:

49:16

Summary

AWS Hero Vadym Kazulkin explains how to overcome Java’s enterprise hurdle on AWS Lambda: cold starts and memory footprints. He shares a technical deep dive into performance tuning, comparing fully managed AWS SnapStart (with pre-snapshot priming hooks) against GraalVM ahead-of-time compilation, while addressing the latest architectural implications of Project Leyden and Java 25.

Bio

Vadym Kazulkin is AWS Serverless Hero and Head of Development at ip.labs GmbH. His focus and interests currently include the design and implementation of highly scalable and available applications in AWS Cloud with the special passion for Serverless. Vadym is also the co-organizer of the Java User Group Bonn meetup and a frequent speaker at various Meetups and conferences.

About the conference

InfoQ Dev Summit Munich software development conference focuses on the critical software challenges senior dev teams face today. Gain valuable real-world technical insights from 20+ senior software developers, connect with speakers and peers, and enjoy social events.

Transcript

Vadym Kazulkin: My name is Vadym. Serverless and Java are both my passions. I'm one of the organizers of Java User Group Bern, where I currently live. Also, AWS Hero. I speak and blog quite frequently about various topics, this is only one of them. Basically, if we're talking about Java popularity in general, there are so many sources that try to figure out differently, like number of job postings, number of GitHub repositories with Java code, Stack Overflow questions. It just doesn't matter what you take, Java will be basically one of the most popular programming languages. Maybe not number one, but number two, number three for sure. If we will be just looking into the world of serverless Java on AWS, we currently see, basically on AWS Lambda, Amazon has its own Java distribution, Amazon Corretto, that they patch and support security stuff more than the life cycle of long-term support, which is good, but something maybe not because they will be supporting Java 8 until 2030.

It's just for those who have difficulties to jump over this mountain. What happened? They only support on Lambda long-term version, and they started to support Java 21 two months later after the release, so we are now three weeks after the release of Java 25, so I hope it will be supported on Lambda within the next one month or two months. It's not up to me. Currently, we have support starting from Java 8 to Java 21, which is fine. Java 25 will be also amazing. Java is for sure a very fast and mature programming language, and it has a huge community in open-source projects, Spring, Quarkus, all the way now AI projects. It's really amazing. If you look at how Java is supported, how many people write Lambda functions with the business logic on serverless with Java, we have currently two services that they can relate to.

They are both a bit old. I would expect the next will come this year, but you see the adoption of Java on AWS Lambda is something like single-digit percent. The most Lambda functions will be written in Python, Node.js, and Java is lagging behind. I will assume it's now better, but it will be still a gap. The question is, of course, why? Because the adoption of Java on the enterprise is quite different.

Serverless with Java - Challenges

There are basically two things that we will cover here. There will be so-called cold start times or latency startup times of the things on AWS Lambda, and we'll be also looking to the memory footprint, which is one of the cost factors on AWS Lambda. Let's start with something basic and simple, and then we will see how to move on from this. This is the very simplified version of the application that we use at our work, which has 200 Lambda functions. I just dumbed down to two. We basically have API gateway for managed API, products API, and for simplicity, only two Lambda functions for creating the product and getting product by ID. I will use completely serverless database, DynamoDB. I will say some words if you use something like managed Postgres or something like this, but we at the company that I work prefer to stay as serverless as possible.

There are certain challenges around the stuff with the relational databases that I will also say. Basically, if the request comes through API gateway, which is the managed API, then what Lambda receives in the end is JSON body. You will see here the JSON structure. Basically, you will see all query parameters, path parameters, body if you have post requests, headers, all that stuff that will basically come as a JSON, and it will be more or less deserialized into this kind of stuff. This is how you will write Lambda function. This is GetProductById, which I showed you, it basically implements RequestHandler with input and output, and we have only one method, handleRequest, which gets this JSON deserialized into API gateway proxy request event, because the trigger is API gateway.

There are other triggers. Basically, from this request event, we can get the parameter ID, which we will send, and then we will talk to DynamoDB to get the product with that ID. This is basically how you will write such a Lambda function, and how this JSON will be then dumbed down into this API gateway proxy request event, and output is proxy response event, where basically put HTTP code message and then serialize the product as an outcome output.

Cold Start

Now let's come to the biggest challenge, which is still there, but we will see how to deal with that, this is the cold start. First of all, what is the cold start? Because probably not everybody is familiar with that. Lambda implements so-called Function as a Service paradigm, and this is quite different because the execution environment of this Function as a Service is only capable, and currently this is the Lambda model, of handling only one request at a time. It just doesn't scale horizontally as we know from Tomcat and so on. We can send many requests, and if it's not possible to handle, then we will need to scale horizontally. One Lambda environment is only capable of handling one request at a time. Then, what does it mean? That basically means we need to start this environment, and if there are not enough environments, and if the environment is used, it goes to the pool of environments.

If there are not enough environments, then other environments need to start. It reminds me a bit of how database connection goes to the data pool. It will be used and put back to the pool. Somebody else requires that, then it will be fetched from the pool. Of course, it's about the size and all that stuff. The thing is when we need a new pool, or AWS is responsible for starting a new pool, for example, if the Lambda function will be invoked for the first time. It might happen if you wrote the function for a new function or you updated the existing function. That needs to be redeployed, and of course, it's a new code. All environments need to be destroyed. Basically, then we need a new environment with that. It might be the case that you didn't touch the code, but you had constantly five requests at a time, and there were five execution environments, now you have 10 requests.

Then we need five new environments in parallel, because those five cannot handle this, and AWS doesn't want that you have latency, they will not queue it. They will try to put more environments live, so more concurrent indication. There is a situation that AWS destroys the existing environment. It might be for the cost-saving reason. For example, they started 10 because you needed 10, and five minutes later, you have once again only five requests. They will shut down with time another five, because it's their cost. They don't charge us for what we are not using, but they have the cost of environments running. They will put it down. There are situations that they need to patch those environments for security risk and so on. They don't say when it exactly happens, but in my experience, no environment survives more than several hours.

They will shut down that for just preventing security attacks and so on. There is no constant pool of those workers. With time, they will all be gone, and that's why we will need to start them. What does start mean? What is the workflow of that? First of all, if we deploy a Lambda function, it just doesn't matter if it's Java or something else, the function code will be uploaded to S3 on Amazon. If we start the environment, this code needs to be downloaded first. AWS has its own technology like Firecracker microVM, which more or less is an execution environment, and they manage them. You can think of managing Kubernetes environments. This is some kind of the execution environment. We don't need to know the details. Basically, if we select Java, then of course this execution environment needs to start Java runtime, this Corretto version that we selected.

Then what they will do, and you see here, start execution environment, the static initializer code block will be executed. This means everything that is in the static initializer block of the Lambda function, and everything that is reachable from that. Because I initialized here ProductDao, which is DynamoDB, it has its own initialization, so everything will be loaded from this static initializer block. That means class loading, runtime dependency injection, just-in-time compilation may kick in. I use here pure Java, if you use something like Spring, then there's annotation processing going on. Everything that is reachable for the static phase will be executed, and the last step, the handler method, will be invoked. The handler method is the function code itself, the function, handler request. The first three steps are so-called cold starts, or start time, startup time of the application.

The last step, AWS calls it warm start, but it's basically only the execution of the function. Because people in the Java world call warm start basically if Java reaches peak performance, like just-in-time compilation, did it work? It's not about this, it's just about the function itself. The warm start is basically only the execution of the function, and it happens when the warm container is there. Then only the last step will be done. If there is a need to start the container, then all three steps before need to be executed, and this is the cold start. Now let's see what this cold start means in terms of performance. Basically, I started with some basic, which is more or less good enough, which we can use as a starting point. I gave Lambda function 1 gig of memory.

People maybe know that you can only give memory to the Lambda function in AWS where you write business logic, the CPU will be derived from memory. You cannot give CPU, you give only memory, and basically proportionally more memory will get proportionally more CPU. With that 1 gig memory you have half of the virtual CPU, which is in most cases enough. x86 architecture, this is default, there is also ARM. To talk to DynamoDB, we need a HTTP client. If we don't do anything, the default one will be Apache. This application has 14 megabytes in size, this is what I measured. We use certain Java compilation options, which is more suitable for serverless world, like we disabled tiered compilation and used basically stop compilation, the first level. I will say some words about that later.

I did something like stress test for one hour. I just put a certain amount of requests and I just waited for 8, 9 minutes, then I just started another request. Basically, in this hour I had experienced 100 cold starts and 100,000 warm starts. You will see these images for cold start and warm start later. This was a completely newly deployed application, which will also play a role. Now let's measure what will we have. Let's start with something positive, this is the warm start, and you see the values are in milliseconds. Basically, if you ignore this max value, and I will explain why it's so big, you see that Java is a very quick programming language. I always look at p90, so 90% of the cases the values are like this, or maybe a bit less.

To get the item from DynamoDB, which is a very simple Lambda function of course, but 7 milliseconds is a good thing. It's just quick, you don't notice that. If you look into the cold start, if the cold start happens, then you see those values and you see this is around 3 seconds for Java. The first thing, maybe now to go back, you saw that there have been only less than 0.1% of the cold starts. This is a good message that those cold starts don't happen very frequently. You will not be impacted by these 3 seconds frequently, but I only tested one Lambda function. Your application might use many Lambda functions to load the page, and if only one will have the cold start and you need the whole chain, then the probability will increase.

Generally, we have 2% to 3% of the cold starts as a normal thing. If you have 2% or 3% times cold start of 3 seconds, you probably know Google's study, they say more than 1 second, people will drop if it's a login function. It might happen 2% to 3% of the time. That's a business impact, or it might be. If it's a synchronous application, you don't care, but if it's a public facing application, you do.

Options to Reduce Cold Start Time

Let's talk about the optimization techniques. I will be talking about two of them, SnapStart and GraalVM. SnapStart is AWS's own technology. Let's start with that. Basically, the goal of SnapStart, AWS saw the adoption of Java, they saw the reasons, cold starts, and they said, we need to do something about that. We are losing the cake. Java developers don't use serverless, or not so frequently, and that's a challenge. Yes, they developed the SnapStart, the goal is to improve startup performance. It's fully managed. I will show you what it means. It started with Java runtimes. It's available starting from Java 11, but they added Python and .NET end of last year. Python, for obvious reasons, to load LLMs and to decrease the startups, and .NET has basically the same problems. It's only available for managed runtimes like Java, Python, and .NET.

You can also deploy a Docker container image on AWS Lambda and runtime. This is how we will ship GraalVM. It's not available for those. You cannot improve that. What is SnapStart? Do people know the CRaC, CRI-O technologies? Does something say to you all those things? There are currently technologies to port containers like CRI-O. They existed 12, 13 years ago, just to port application from one Docker container to another. CRaC is the technology to restore JVM. AWS did more or less their own thing, but it's similar. What happened? They divided deployment, the workflow of Lambda in two phases. First of all, without SnapStart, nothing happened when we deployed. We simply deployed the Lambda function until invocation, nothing happened. If we activate SnapStart for the Lambda function, there will be additional steps. During the deployment phase, AWS will start the container with Java,

download the code, and execute cold start phase of the application. They will download the code, start JVM, and load basically static initializer block. They cannot execute, of course, your Lambda function. They don't have the payload. They don't know what you want, product with ID 1, 2, 3, but they can go until this point with that cold start phase with the goal to create the snapshot of the application. The snapshot contains everything, just load the JVM, static load the classes until then, and all that stuff. The snapshot is the complete snapshot of the microVM. It includes everything: Linux as an operating system, JVM, and everything from your code that was loaded until then. During the invocation phase, you will experience, in case you activate SnapStart, then the restore of this snapshot. The snapshot will be restored, and then the execution flow will resume.

The promise here is that the restore phase is quicker than the cold start. Otherwise, it simply doesn't make any sense. You can then measure and decide for your operation. They think, ok, the snapshot, this is completely microVM snapshot, will be quicker, and we will measure whether it's true or not. Basically, you go to the Lambda function, and you decide, would you like to have it or not, this SnapStart? It's basically true or false. It's one line more in the infrastructure as code, but that's it. This is the flag, and you decide. Before I show you the measurements, I would like just to present one more technique, which is called priming, which you can do with SnapStart. Once again, we show that AWS basically starts the container, loads more or less this handler class, but doesn't execute the function.

There is a possibility to load as much as possible on classes or pre-initialize stuff so that it becomes part of the snapshot. Do as much as possible that it's in the snapshot, because the restore will be better. The thing is, we will show that Java loads the classes lazily, partially. There are things that are common knowledge, what we can do, especially if we talk to things like DynamoDB. Once again, DynamoDB invocation, there is the HTTP client behind. Default one is Apache. If you go just to your editor and measure the time it takes to initialize one time HTTP client, you will notice, half a second. Because it's invoking a lot of instantiation, caching, and so on, Apache is more for long-living applications like Tomcat. There is a chance to pre-initialize this Apache client, that the pre-initialization becomes part of the snapshot.

The same is true, talking to DynamoDB involves JSON Marshaller or Unmarshaller, because you're dealing with Java object and DynamoDB is NoSQL. You serialize that in JSON or back, depending if you are writing or reading. The default serializer is Jackson. If you execute new ObjectMapper, now in your IDE for the first time, execution time, depending on how much CPU you have, will be 300 to 500 milliseconds. If you do it for the second time, 1 millisecond. If you look into the code below, there are so many singletons that will be loaded only for the first time of the life cycle. The big question is, can we pre-initialize that stuff? If we can pre-initialize, how can we do this? You see these orange boxes, there are possibilities to do that with hooks. Let's ignore this post-snapshot hook, I don't use it,

but this pre-snapshot hook is very important. This is the additional method. I will show you how to implement this. There we can put additional logic to pre-initialize the stuff that will become part of the snapshot. Priming is exactly this. We are faking pre-initializing the stuff. It's maybe not very convenient, but it's a good way just to speed up things. Once again, you saw that the warm start, the max value even in the warm start was one and a half seconds, I showed you, 1500 milliseconds even in the warm start. Now I will explain why. Because even if we execute the function itself, the handler, getProductById, and it asks DynamoDB for the first time, then Java loads the classes when it first needs them. For example, this getProductById in DynamoDB ProductDao, it will initialize the getItemResponse.

This is the way to talk to DynamoDB. All these classes in this chain will be initialized for the first time even in the warm execution phase. Because in the static block, we only initialize DynamoDB class, so only static initializer block classes will be loaded. This getProductById, all this logic, including ObjectMapper, which we only use when we talk to DAO, this all happens in the warm start time. Even if we see the warm start phase, the first execution might be also slow because the classes need to be loaded. How to deal with that? Basically, you need to include one more dependency, which is this org-crac. This org-crac gives the possibility to implement resources interface, which comes from this org-crac. Basically, you need to register this Lambda function or other Lambda functions by copying this line as a CRaC resource,

GetCore, getGlobalContext, register this. You simply copy this. The most important method is before the checkpoint, which comes from the resource interface that you need to implement. You can do the loading stuff in this method, and it will become the part of the snapshot. What I'm doing here, I'm basically executing ProductDao, getProductById with ID 0. I'm not interested in the result. By executing this, HTTP client will be loaded, which is required to talk to DynamoDB, and JSON Marshaller will be loaded just to serialize JSON into Java object, into this product. That's enough. I'm not interested in the result. I preloaded the stuff. The classes are loaded. The things I initialized, it becomes part of the snapshot. After restore, I don't do this. The nice thing is that with the SnapStart, talking to DynamoDB involves a HTTP connection.

It will be restored for you. You don't need to write the code. It survives snapshot and restore. The same as if you are talking to Postgres through JDBC, it will also survive. You don't need to write the code. This is the magic of SnapStart. Of course, it requires common knowledge that sometimes you don't use DynamoDB. You need to think, what can I prime? If you do something like A plus B, you probably cannot prime anything. What should you prime in this? It's simple. You probably don't need SnapStart, and you won't have cold starts. There is a SnapStart Priming guide that comes from AWS. They describe what you can do. The best thing you can do is preload as many classes as possible that will be loaded lazily, that are not part of the static initializer block. It might also save you things. With DynamoDB, there is more.

Let's look into the measurements. On the left side, this is what we saw. This is the cold start, from 3 to 4 seconds. This is with the percentiles. The second one is SnapStart but without priming. Just simply checkbox, I would like to have SnapStart. The third one is SnapStart with priming. In this case, this product, getProductById 0. One line of code. What you see, I always look into this yellow one, this p90. It's good enough. Only by activating SnapStart, which is basically checkbox, the cold start goes down from 3.2 seconds to 2 seconds. Just 40%, something like this. By doing the priming, you see 1 second. It goes down. Restoring from cache is generally quicker, as usual. It requires not too much code, at least for this scenario.

This is the warm start, only until p99. SnapStart is basically not the technology for improving the warm start, the execution time. What you see, the max value is now very low because I pre-initialized the stuff. I pre-initialized everything. I pre-loaded the classes. Even the max value of the warm start decreased a lot. We could do basically the same by faking getProductById without using SnapStart and static initializer block. We could do it as well. We will achieve the same result, only for the warm start. For the cold start, we need something like SnapStart.

Lambda Performance Tuning Approaches

This is what I showed you. Interesting enough. We did that with basically this default configuration. There is a possibility to tune this stuff. I will only briefly show you what to do, what to test out. We gave the Lambda memory of 1 gigabyte. You can tweak. You can give more or less memory and see what will be the cold start. Less memory might mean less cost because memory is one of the cost factors, but you will get less CPU. You need to play around. What's enough? You can give more memory, basically until 1.8 gigabytes because after this you will get the second core, and if the application doesn't benefit, then it doesn't matter. Then you can give more memory. It might cost you more, but then the performance will be better. This is what you need to test.

I showed you this big compilation option with stop at level 1. Why is that the good one? Because Lambda functions are short-lived and by default Java uses tiered compilation, which basically waits until the function is executed 10,000 times to optimize something with JIT. This is not what will happen with Lambda function. The container will be away until that optimization will kick in. Just-in-time compiler requires threads that optimizes that, and it eats from your CPU. That's why, by using another Java compilation option, just we didn't want to optimize that aggressively, which freed the CPU for us, and it was better. HTTP client implementations, it's not only Apache. AWS has its own native one. You can test it. Or there is URL connection, HTTP client, not only the Apache one.

On my experience, if you use priming like this, it doesn't matter anymore because everything is preloaded. If you can prime, then the results are very close. Architecture, you can test if ARM is better. ARM is cheaper if you select the ARM architecture, but I had bigger cold starts with that. It's also, once again, price-performance thing. It's cheaper, but the performance degrades, and so on. Or maybe you need to think about your personal application. Are there any priming techniques and so on?

Best Practices and Recommendations

What's important anyway, additionally, pay attention to the size of your application, especially if you use something like Tomcat, we push dependencies because it starts once and maybe survives weeks. It just doesn't work like this in the serverless world. We have this application, 14 megabytes with a medium size. This is what we tested. I wrote the application, which is Hello World. Nobody uses that. It doesn't have any dependencies. The size was 130 kilobytes. Our application was 14 megabytes, and I just wrote the application, which uses tracing and all that stuff. It was 50 megabytes. You can see here the differences in the cold start times, and you see even with SnapStart and even with priming, the size matters. The bigger the size, the bigger the cold start because more dependencies, and they need to be snapshotted and restored, and it always takes more time.

Take care of your dependency. If it's test dependency, just don't include it because it will be loaded. It needs to be downloaded from S3 and all that stuff. Really put only the things that you need and nothing else because otherwise you will be penalized. There is one more thing here. I measured the performance and what I showed you for all 100 executions. I have simply redeployed the application. The thing is, how this snapshot works is very important. AWS stores it in three availability zones in so-called tiered low-latency cache. We don't need to care about it, but it's interesting to see this stuff. This is the chunked cache, so they take this whole snapshot, chunk it into 512 kilobytes, and store them. Of course, if you deploy the application for the first time, the cache is empty.

The thing is they try to figure out, because it's also the cost for them, how to optimize the snapshots for you. For example, if you invoke Lambda functions frequently, this cache will be on the instance of where the Lambda is deployed. If your Lambda functions are executed less frequently, there is a fleet of caches for this, but the distance increases. The last source of truth is S3. If you have a less frequently invoked function, then it will be on S3. Of course, the logic here is, the nearer it is to the instance, the quicker the restore will be. It's how it is. The same as with memory. Probably you know L1, L3, RAM, the same thing. These are caches. The nearer to the core, then the better. You cannot basically control this, and this is how it is.

Of course, the more executions are, then the better the chances that it will be on L1 cache. AWS also traces the execution path. If you have if statement in the Lambda function, they try to figure out what to preload, depending on your payload. If ID is below 10, then I do this, and ID more than 10, I do that. They watch for this, and they try to preload these chunks asynchronously for you, because you normally don't need the whole chunk at the beginning. This is the whole magic. You shouldn't even be aware of this. This is the important thing, that sometimes you might have different results, depending on the Lambda function, and it's important.

The factor of this is the frequency of execution and the size of the image. One function might have more dependency, and that's why the image size is bigger, and that's why you might have a bit different lower performance for that. You cannot configure this, but this is to know that the smaller function and more frequently called will be favorized. This is how it works. Yes, so infrequently accessed, will go to L2 cache or even to S3. That's basically what I said. I wrote the article series. I will provide links to my slides, and you can read about this. I recommend this talk, given at InfoQ in San Francisco by Mike Danilov, now the former engineer, "AWS Lambda under the Hood." He explained not only how Lambda works, but how SnapStart works. It deserves a whole talk, and especially how this cache works. It's just good to know, once again, you can configure anything.

The most important property of this cache, it becomes faster with time because when you start executing the Lambda function, they place these chunks of your function into the cache, and the cache was empty in the beginning. They rearrange the stuff, so the more frequently you execute, the more cache is filled, and the quicker it will be. What I did here, I compared all 100 executions after the redeployment only with the last 70. I dropped the first 30 executions for the last comparison, and basically, it's only valid if you activate SnapStart. What you see here, the comparison is SnapStart without priming for all 100. This is for SnapStart without priming for last 70, and this is SnapStart with priming the same.

If you look here, if you can prime, you see that with priming p90 for all 100 was 1.2 seconds, and here it is 650 milliseconds. Basically, what it says, the last 70 executions were quicker, and the first 30 were a bit slower. Now comes, how long does your Lambda function live? How frequently do you redeploy it? If you don't touch it for weeks, you will experience maybe 10,000, 100,000 cold starts. The first 30, which are slow, don't count. They are only a bit slow. This might be even the better measurement. This is the message, don't stop measuring with SnapStart and priming after several measurements. Measure until the end. You will see the benefits of the cache. You see, of course, from a certain point, it will not be quicker if the cache is filled. It cannot be, but basically, the last 70, you see p90, 700 milliseconds, and it's ok, the cold start. For JavaScript and Python, you will have something like 300 milliseconds, 400 milliseconds. Here, we can get to 700 milliseconds, but it's not 3 seconds more. That's important.

AWS Lambda Profiler Extension for Java

You can look into what's happening in your Lambda function. AWS released asynchronous profiler for that. I wrote the article, I will show you. It's profiler extension, so-called, but it's based on open-source asynchronous profiler. You can look, can you optimize? These are the flame graphs, and you can see where the time is spent. What is loaded when? If you see, you can preload this stuff, and prime this stuff. You can game the thing. This is what I did as this profiler was released. I asked myself, can I prime something here? Because I don't have, previously, any influence, any impact on deserialization of this. AWS serialized the payload into this object, but can I do something about this? This flame graph helped me because I saw what happens until my handler will be executed.

Everything was public, but only this Lambda serializer, serializeFor, was public library, and I could do something about this. I extended my code. It's a lot of code. I don't say you need to use that, but basically, I used this serializeFor. I saw what they do, and they did lots of stuff. I just primed that. I used this before checkpoint. I primed it. I faked this API gateway, minimalistic request. I only said get in these parameters ID equals to 0, and went through this priming. I, once again, improved the cold start by 20%. Once again, it's a bit more code to write, and it's up to you. Do you want just to have this custom coded? Of course, if you have something like put product, then you set here post, and you can send the body, so you can do everything. It's much more additional code, but this flame graph helped me a lot. I even achieved 25% reduction because of that.

AWS SnapStart Pricing

Pricing. The thing is, as Java SnapStart was released, it was for free, but AWS recognized everybody uses SnapStart on staging for every function, and this is their storage and CPU to restore. Then they released it for Python and .NET, they introduced pricing for cache and for restore phase. It's still currently free for Java, but expect that AWS might change that. They didn't want to piss off Java developers, which used to have it for free, but they might observe and say that will be the same pricing for you because it's their money and people abusing that. Once again, you need to think about uniqueness because restore from the snapshot, you need to think not to use something like current time millis in the static initializer block, because then it will be your restore time in the same time.

The same thing is, think of not using time-based caches because you don't know when it will be restored. Something will be restored and the cache is empty. Please save for 8 hours this and that, if the snapshot will be restored before these 8 hours, you have the values, after 8 hours you don't have. Don't rely on that. Think about how to mitigate that, and basically don't use it. There are other caches like that.

AWS SnapStart, Challenges and Limitations

There are limitations. The most important thing, when you deploy Lambda functions with SnapStart, the snapshot is taken. It needs to be secured and stored in all availability zones, and it takes two to two-and-a-half minutes for a Lambda function. This is developer experience. This is the tradeoff. If you deploy many functions in parallel, the snapshot will be also taken in parallel. You will not be penalized by deploying the application with 10 Lambda functions, but there are additionally two to two-and-a-half minutes. Maybe don't do this for staging if you would like to test quickly. It's not about the performance test. This is the basic limitation. Snapshot is currently deleted if you don't invoke Lambda function for 14 days. They save on storage. They don't want this. If they will introduce pricing, they will remove this limitation for Java developers because it's then up to you, if you pay for storage. Currently, they protected themselves because once again, everybody activated this on staging and then they don't do any tests and AWS pays. It's, of course, not the reason for that.

GraalVM

Now let's compare to GraalVM. What's important here is that they provide ahead-of-time compilation, so the closed world, build native image with everything that is reachable, method classes, fields, and everything else. AOT is good for serverless things like startup speed, low memory footprint, small packaging, because only reachable things will be packaged. These are the properties of the serverless world, basically. AOT is good stuff. Of course, native executables, they are smaller and they don't need too much memory because there is no just-in-time compiler. It's native image. The promise is that by using native image, you can attack both start time and memory. AWS doesn't provide you managed GraalVM, but they provide you the possibility to deploy custom runtime. With custom runtime, you can deploy GraalVM image that's basically a ZIP file with a native image within it. There's a format. You need to call it function_zip, and then the bootstrap file should be GraalVM Native Image.

Once again, this is not the talk on how to build it. You can look into this. This is for comparison. I'm now testing GraalVM 25. I added, you see here, the SnapStart, what we saw, GraalVM 23 Native Image. You see the cold starts are even less, but these are not last 70, but all 100. You have here also the possibilities, just predictable local cold start. It even provides slow warm start with GraalVM because of the mentioned properties. Valid technology, and even there's a max value, p99.9. Really cool thing, but you need to see if your dependencies are GraalVM ready. GraalVM has this, they post the page so you can see, my frameworks, like Quarkus, Helidon, Spring, Micronaut, they are all there, but they also publish the dependencies because sometimes you use some dependencies, they use other dependencies, and they should be GraalVM ready.

Why? Because you cannot do any runtime instantiation out of the box. It's a closed world, native image. If you don't put the class into the native image, it will not be found. You cannot do something like load the class thing. There is a possibility to declare those classes in advance and they will be packaged. The classes that are only reachable pre-reflection. For example, the Lambda function itself, it will be dynamically instantiated by AWS. They look into infrastructure as code, see this class, they instantiate it, and the same is also for other stuff. It requires, if you use GraalVM, a bit of training to find all those classes. Otherwise, you will get runtime exceptions, ClassNotFound. This is especially painful if you use loggers. You need to declare all that stuff. I use here SLF4J. It's just additional declaration. Sometimes you only do it once, but it's painful.

Luckily, people using Log4j have, since several months, GraalVM support for version 2.25.0. Before that, it was nearly impossible to declare everything. I just simply stopped after 3 hours. Now, GraalVM brings all this stuff for you. Finally, you can do this. There is a possibility to use Assistant GraalVM Tracing Agent. You need to inject this run into your test run, and they find everything for your application by observing your test classes execution, what will be executed during the runtime, and they more or less generate this code for you. Sometimes the things are missing still. If your tests are not complete and don't go through those paths, then you can still miss the stuff. GraalVM, powerful technology, a lot of potential. It improves really cold start and memory footprint. Once again, you manage pipeline to create GraalVM Native Image. That's your work.

That's why I like SnapStart, it's managed. You don't manage this, take the snapshot, store the snapshot, restore. You don't manage that. Here, the pipeline to build it, it's on your shoulders. In my examples, it requires one Lambda function, 6 gigabytes of memory to build such a native image, and it also takes up to 3 minutes. It may be that you use CI/CD in the cloud and it scales like hell, but it will be your cost anyway. Developer experience with 3 minutes is still there. It's comparable to the SnapStart's two-and-a-half minutes. This is how it is. You need to be careful with runtime errors. You can update one dependency. It brings some dependency which is not GraalVM ready anymore and the code breaks. You need to carefully watch what you deploy.

Wrap-Up and Suggestions

These are basically my suggestions. They are both valid technologies, but if you're willing to accept a bit higher cold, and warm start times, I would probably use SnapStart as a starting point because it's fully managed. If it's about each milli-millisecond, go for GraalVM. That's fine. You will probably need to work a bit more. There are edge cases. Or at least it was my suggestion until September 17, and I will say why. Once again, there is Powertools for Java. There's a helper method to deal with idempotency, logging, tracing, but it's a huge library. Measure the cold starts and warm start. I will release later the series about this. If you use that, then it's just additional megabytes. This is what came on the day of the release of Java 25. GraalVM announced the detaching from Java ecosystem train,

and they are saying they are focusing on non-Java GraalVM languages and not on GraalVM Native Image anymore. Nobody understands that. I also ask, what does it mean? Will they support GraalVM? Should we still use it, or it's not? They basically point out to the Project Leyden, which comes from OpenJDK, and it came with Java 25. Project Leyden's goal is to improve startup, time to peak performance, and even footprint of Java programs. You can test it. It's not the same as GraalVM Native Image. They basically have AOT cache and class data sharing cache. With that, you will not get the performance of the GraalVM. It's worse. That's why we are currently puzzled. What does it mean for GraalVM Native Image? Will it be supported or not? I don't think it's cool to release something, "The last version was the last that we supported fully after Java 25 was released." You need something in advance for this, but we cannot change. We all would like to clarify.

On the slides, you will see the links to the GitHub repo. You can take everything and re-measure, but I measured this with Java 21. You can do it later with 24. AWS updates the version, so the minor version has changed, and hopefully it's improved. SnapStart might improve. Firecracker might improve, so you might get different results. The thing that I really see, I use a bunch of AWS dependencies, if you upgrade the versions, and I tested several months ago, the tendency with the new version, the size of dependencies increases. You saw the increase of the size means increase of cold start. It's especially true if I measured something. It's another topic with Spring Boot 3.2, and then I switched to Spring Boot 3.4, 10% size increase overall only by updating. It becomes bigger and bigger.

You can find me on dev.to. I went through all this tuning stuff, how you can do this in more depth. I released a series about Spring Boot, Quarkus, Micronaut if you use them. I'm still adding stuff, and we'll remeasure with 25 and Spring Boot 3.6, so you can look into this one. I also released other articles about if you don't want to use DynamoDB but want to use Postgres, like there are changes on how to deal with that. AWS released several months ago DSQL database, which is very suitable for the serverless world. It's Postgres compatible, but not 100% compatible. I will measure with that because it gives finally the possibility to deal with relational database and Lambda without thinking of connection management, which is a big issue.

We have a smaller function taking all the connections from the database, this Aurora DSQL solves at least that problem, so watch out. I will measure. Basically, the message is you can go down with a cold start quite significantly. Just SnapStart. If you look into this, it gives you lots of opportunities. I hope it will even be improved. You will be below 1 second, and if it happens only 2% and 3% of the time, I think it's valid. If you have solid Java knowledge in your company and would like to go serverless or use Java on AWS Lambda, so why not? That's basically the issue. You don't need to be afraid, too much.

See more presentations with transcripts

Recorded at:

Jun 15, 2026

Vadym Kazulkin

InfoQ Software Architects' Newsletter