Search code examples
pythonamazon-web-servicesfunctionaws-lambdaconcurrency

AWS Lambda ramps up to 15 concurrent executions and then stops executing code inside lambda_handler before dropping to 1 concurrent execution


I have an AWS Lambda function with a timeout of 900 seconds (15 minutes). It was setup to run every 20 minutes using an EventBridge schedule. Also, in the code, there's logic so that once it reaches 14 minutes of runtime, it ends the function. It was running perfectly.

Then, I changed the reserved concurrency limit to 15 for the function, and changed the EventBridge schedule to every one minute. This is where it started getting weird.

enter image description here

The function was firing every one minute, but once it reached 15 concurrent executions, it appeared to have stopped executing the code contained inside the lambda_handler function. Then the number of concurrent executions dropped from 15 to 1. What I mean by the function code not being executed is this:

enter image description here

Note how the whole execution takes 1.74ms. I put a print() statement right below the lambda_handler() to see if it got executed but it doesn't even get that far. It looks as if the function gets called, but none of the code within it gets executed. Interestingly, if I update the function and redeploy, it returns to running normally again and executing the code. This would point to cold starts working ok, but warm instances not behaving correctly.

I enabled x-ray, and all it's really showing is that the function execution lasted 2ms. Is there something else I can look at?

Any ideas would be appreciated. Thanks.

Update 1 (Code of Lambda Function)

Note - I hid the implementation of process_symbol() as it's just downloading data for a stock symbol and saving it to S3.

import os
import json
import s3fs
import urllib3

from datetime import datetime

MAX_RUNNING_TIME_SECONDS = 900

# Timing
start_execution_time = datetime.now()

def get_execution_time_remaining():

    return MAX_RUNNING_TIME_SECONDS - (datetime.now() - start_execution_time).seconds

def lambda_handler(event, context):

    print('Execution beginning')

    symbols = []
    execution_time_remaining = get_execution_time_remaining()

    while execution_time_remaining > 60:

        # Get next symbol to load
        symbol_metadata = get_next_symbol_to_load()

        if symbol_metadata:
    
            symbol = symbol_metadata[0]
            symbols.append(symbol)
        
            # process_symbol(symbol)
            # update_rds(symbol)

        execution_time_remaining = get_execution_time_remaining()
        print(f'execution_time_remaining = {execution_time_remaining}')

    return_message = ''
    current_time_text = datetime.now().strftime('%m/%d/%Y, %H:%M:%S.%f')

    if symbols:
        return_message = f"Successfully saved {', '.join(symbols)} to the cloud at {current_time_text}."
    else:
        return_message = f'Successfully execution at {current_time_text} but no symbols were processed.'

    return {
        "statusCode": 200,
        "body": return_message,
    }

Update 2

Also noteworthy, after it ramps up to 15 concurrent executions and stops executing the code contained in the lambda function, there's a break of about 80-90 minutes, before it somehow starts executing the function code again.

enter image description here


Solution

  • As @Maurice hinted at, there was a global variable affecting the execution of the code in a concurrency scenario. The assignment:

    start_execution_time = datetime.now()
    

    which was previously assigned above the lambda_handler(), when moved to inside it, resulted in the concurrency working as expected. This blog post was also useful:

    https://pfisterer.dev/posts/aws-lambda-container-reuse/

    as it discussed the how lambda reuses containers and what affect that might have on executions.