Search code examples
javaamazon-web-servicesaws-lambdaserverless-frameworkcold-start

How to improve performance of initial calls to AWS services from an AWS Lambda (Java)?


I recently tried to analyze some performance issues on a service hosted in AWS Lambda. Breaking down the issue, I realized that it was only on the first calls on each container. When isolating the issue, I found myself creating a new test project to get a simple example.

Test project (You can clone it, build it mvn package, deploy it sls deploy and then test it via the AWS Management Console.)

This project has 2 AWS Lambda functions: source and target. The target function simply returns an empty json {}. The source function invokes the target function using the AWS Lambda SDK.

The approximate duration of the target function is 300-350 ms on cold starts and 1ms on hot invokes. The approximate duration of the source function is 6000-6300ms on cold starts and 280ms on hot invokes.

The 6 seconds overhead on the cold starts of the source function appear to be 3 seconds of getting the client and 3 seconds of invoking the other function, in hot invokes that is 3ms and 250ms respectively. I get similar times for other services like AWS SNS.

I don't really understand what it is doing in those 6 seconds and what I can do to avoid it. When doing warmup calls, I can get the client and store the reference to avoid the first few seconds, but the other few seconds come from actually using the other service (SNS, Lambda, etc), which I can't really do as a no-op.

So, do other people experience the same cold start durations and what can I do to increase the performance on that? (other than bringing the memory setting up)


Solution

  • The main reason for slow cold-start times with a Java Lambda is the need to load classes and initialize objects. For simple programs this can be very fast: a Lambda that does nothing other than print "Hello, World" will run in ~40 ms, which is similar to the Python runtime. On the other hand, a Spring app will take much more time to start up, because even a simple Spring app loads thousands of classes before it does anything useful.

    While the obvious way to reduce your cold-start times is to reduce the number of classes that you need to load, this is rarely easy to do, and often not possible. For example, if you're writing a web-app in Spring there's no way around initializing the Spring application context before processing a web request.

    If that's not an option, and you're using the Maven Shade plugin to produce an "uber-JAR", you should switch to the Assembly plugin as I describe here. The reason is that Lambda unpacks your deployment bundle, so an "uber-JAR" turns into lots of tiny classfiles that have to be individually opened.

    Lastly, increase your memory allotment. This without question the best thing that you can do for Lambda performance, Java or otherwise. First, because increasing memory reduces the amount of work that the Java garbage collector has to do. Second, because the amount of CPU that your Lambda gets is dependent on the memory allotment. You don't get a full virtual CPU until 1,769 MB. I recommend that for a Java app you give it 2 GB; the cost of the bigger allotment is often offset by reduced CPU requirements.

    One thing I would not do is pay for provisioned concurrency. If you want a machine up and running all the time, use ECS/EKS/EC2. And recognize that if you have a bump in demand, you're still going to get cold starts.


    Update: I spent some time over the holiday quantifying various performance improvement techniques. The full writeup is here, but the numbers are worth repeating.

    My example program was, like the OP's, a "do nothing" that just created an SDK client and used it to invoke an API:

    public void handler(Object ignored, Context context)
    {
        long start = System.currentTimeMillis();
        
        AWSLogs client = AWSLogsClientBuilder.defaultClient();
        
        long clientCreated = System.currentTimeMillis();
        
        client.describeLogGroups();
        
        long apiInvoked = System.currentTimeMillis();
        
        System.err.format("time to create SDK client = %6d\n", (clientCreated - start));
        System.err.format("time to make API call     = %6d\n", (apiInvoked - clientCreated));
    }
    

    I ran this with different memory sizes, forcing a cold start each time. All times are in milliseconds:

    |                   |  512 MB | 1024 MB | 2048 MB | 4096 MB |
    |+++++++++++++++++++|+++++++++|+++++++++|+++++++++|+++++++++|
    | Create client     |    5298 |    2493 |    1272 |    1019 |
    | Invoke API call   |    3844 |    2023 |    1061 |     613 |
    | Billed duration   |    9213 |    4555 |    2349 |    1648 |
    

    As I said above, the primary benefit that you get from increasing memory is that you increase CPU at the same time. Creating and initializing an SDK client is CPU-intensive, so the more CPU you can give it, the better.


    Update 2: this morning I tried compiling a simple AWS program with GraalVM. It took several minutes to build the stand-alone executable, and even then it created a "fallback image" (which has an embedded JDK) due to dependencies of the AWS SDK. When I compared runtimes, there was no difference between running with standard Java.

    Bottom line: use Java for things that will run long enough to benefit from Hotspot. Use a different language (Python, JavaScript, perhaps Go) for things that are short-running and need low latency.