I have two Lambda functions written in Python:
Lambda function 1: Gets 'new' data from an API, gets 'old' data from an S3 bucket (if exists), compares the new to the old and creates 3 different lists of dictionaries: inserts, updates, and deletes. Each list is passed to the next lambda function in batches (~6MB) via Lambda invocation using RequestResponse
. The full datasets can vary in size from millions of records to 1 or 2.
Lambda function 2: Handles each type of data (insert, update, delete) separately - specific things happen for each type, but eventually each batch is written to MySQL using pymysql
executemany
.
I can't figure out the best way to handle errors. For example, let's say one of the batches being written contains a single record that has a NULL
value for a field that is not allowed to be NULL
in the database. That entire batch fails and I have no way of figuring out what was written to the database and what wasn't for that batch. Ideally, a notification would be triggered and the rouge record would be written somewhere where it could be human reviewed - all other records would be successfully written
Ideally, I could use something like the Bisect Batch on Function Failure
available in Kinesis Firehose
. It will recursively split failed batches into smaller batches and retry them until it has isolated the problematic records. These will then be sent to DLQ if one is configured. However, I don't think Kenesis Firehose
will work for me because it doesn't write to RDS and therefore doesn't know which records fail.
This person https://stackoverflow.com/a/58384445/6669829 suggested using execute
if executemany
fails. Not sure if that will work for the larger batches. But perhaps if I stream the data from S3 instead of invoking via RequestResponse
this could work?
This article (AWS Lambda batching) talks about going from Lambda to SQS to Lambda to RDS, but I'm not sure how specifically you can handle errors in that situation. Do you have to send one record at a time?
This blog uses something similar, but I'm still not sure how to adapt this for my use case or if this is even the best solution.
Looking for help in any form I can get; ideas, blog posts, tutorials, videos, etc.
Thank you!
I do have a few suggestions focused on organization, debugging, and resiliency - please keep in mind assumptions are being made about your architecture
Organization
You currently have multiple dependent lambda's that are processing data. When you have a processing flow like this, the complexity of what you're trying to process dictates if you need to utilize an orchestration tool.
I would suggest orchestrating your lambda's via AWS Step Functions
Debugging
At the application level - log anything that isn't PII
Now that you're using an orchestration tool, utilize the error handling of Step Functions along with any business logic in the application to error appropriately if conditions for the next step aren't met
Resiliency
Life happens, things break, incorrect code get's pushed
Design your orchestration to put the failing event(s) your lambdas receive into a processing queue (AWS SQS, Kafka, etc..)- you can reprocess your events or if the events are at fault, DLQ them.
Here's a nice article on the use of orchestration with a design use case - give it a read