I have an AWS Lambda function with a timeout of 900 seconds (15 minutes). It was setup to run every 20 minutes using an EventBridge schedule. Also, in the code, there's logic so that once it reaches 14 minutes of runtime, it ends the function. It was running perfectly.
Then, I changed the reserved concurrency limit to 15 for the function, and changed the EventBridge schedule to every one minute. This is where it started getting weird.
The function was firing every one minute, but once it reached 15 concurrent executions, it appeared to have stopped executing the code contained inside the lambda_handler function. Then the number of concurrent executions dropped from 15 to 1. What I mean by the function code not being executed is this:
Note how the whole execution takes 1.74ms. I put a print() statement right below the lambda_handler() to see if it got executed but it doesn't even get that far. It looks as if the function gets called, but none of the code within it gets executed. Interestingly, if I update the function and redeploy, it returns to running normally again and executing the code. This would point to cold starts working ok, but warm instances not behaving correctly.
I enabled x-ray, and all it's really showing is that the function execution lasted 2ms. Is there something else I can look at?
Any ideas would be appreciated. Thanks.
Update 1 (Code of Lambda Function)
Note - I hid the implementation of process_symbol() as it's just downloading data for a stock symbol and saving it to S3.
import os
import json
import s3fs
import urllib3
from datetime import datetime
MAX_RUNNING_TIME_SECONDS = 900
# Timing
start_execution_time = datetime.now()
def get_execution_time_remaining():
return MAX_RUNNING_TIME_SECONDS - (datetime.now() - start_execution_time).seconds
def lambda_handler(event, context):
print('Execution beginning')
symbols = []
execution_time_remaining = get_execution_time_remaining()
while execution_time_remaining > 60:
# Get next symbol to load
symbol_metadata = get_next_symbol_to_load()
if symbol_metadata:
symbol = symbol_metadata[0]
symbols.append(symbol)
# process_symbol(symbol)
# update_rds(symbol)
execution_time_remaining = get_execution_time_remaining()
print(f'execution_time_remaining = {execution_time_remaining}')
return_message = ''
current_time_text = datetime.now().strftime('%m/%d/%Y, %H:%M:%S.%f')
if symbols:
return_message = f"Successfully saved {', '.join(symbols)} to the cloud at {current_time_text}."
else:
return_message = f'Successfully execution at {current_time_text} but no symbols were processed.'
return {
"statusCode": 200,
"body": return_message,
}
Update 2
Also noteworthy, after it ramps up to 15 concurrent executions and stops executing the code contained in the lambda function, there's a break of about 80-90 minutes, before it somehow starts executing the function code again.
As @Maurice hinted at, there was a global variable affecting the execution of the code in a concurrency scenario. The assignment:
start_execution_time = datetime.now()
which was previously assigned above the lambda_handler(), when moved to inside it, resulted in the concurrency working as expected. This blog post was also useful:
https://pfisterer.dev/posts/aws-lambda-container-reuse/
as it discussed the how lambda reuses containers and what affect that might have on executions.