amazon-web-services amazon-s3 aws-lambda amazon-cloudfront rds

Trigger RDS lambda on CloudFront access

I'm serving static JS files over from my S3 Bucket over CloudFront and I want to monitor whoever accesses them, and I don't want it to be done over CloudWatch and such, I want to log it on my own.

For every request to the CloudFront I'd like to trigger a lambda function that inserts data about the request to my MySQL RDS instance.

However, CloudFront limits Viewer Request Viewer Response triggers too much, such as 1-second timeout (which is too little to connect to MySQL), no VPC configuration to the lambda (therefore I can't even access the RDS subnet) and such.

What is the most optimal way to achieve that? Setup an API Gateway and how would I send a request to there?

Solution

This seems like a suboptimal strategy, since CloudFront suspends request/response processing while the trigger code is running -- the Lambda code in a Lambda@Edge trigger has to finish executing before processing of the request or response continues, hence the short timeouts.

CloudFront provides logs that are dropped multiple times per hour (depending on the traffic load) into a bucket you select, which you can capture from an S3 event notification, parse, and insert into your database.

However...

If you really need real-time capture, your best bet might be to create a second Lambda function, inside your VPC, that accepts the data structures provided to the Lambda@Edge trigger.

Then, inside the code for the viewer request or viewer response trigger, all you need to do is use the built-in AWS SDK to invoke your second Lambda function asynchronously, passing the event to it.

That way, the logging task is handed off, you don't wait for a response, and the CloudFront processing can continue.

I would suggest that if you really want to take this route, this will be the best alternative. One Lambda function can easily invoke a second one, even if the second function is not in the same account, region, or VPC, because the invocation is done by communicating with the Lambda service's endpoint API.

But, there's still room for some optimization, because you have to take another aspect of Lambda@Edge into account, and it's indirectly related to this:

no VPC configuration to the lambda

There's an important reason for this. Your Lambda@Edge trigger code is run in the region closest to the edge location that is handling traffic for each specific viewer. Your Lambda@Edge function is provisioned in us-east-1, but it's then replicated to all the regions, ready to run if CloudFront needs it.

So, when you are calling that 2nd Lambda function mentioned above, you'll actually be reaching out to the Lambda API in the 2nd function's region -- from whichever region is handling the Lambda@Edge trigger for this particular request.

This means the delay will be more, the further apart the two regions are.

This your truly optimal solution (for performance purposes) is slightly more complex: instead of the L@E function invoking the 2nd Lambda function asynchronously, by making a request to the Lambda API... you can create one SNS topic in each region, and subscribe the 2nd Lambda function to each of them. (SNS can invoke Lambda functions across regional boundaries.) Then, your Lambda@Edge trigger code simply publishes a message to the SNS topic in its own region, which will immediately return a response and asynchronously invoke the remote Lambda function (the 2nd function, which is in your VPC in one specific region). Within your Lambda@Edge code, the environment variable process.env.AWS_REGION gives you the region where you are currently running, so you can use this to identify how to send the message to the correct SNS topic, with minimal latency. (When testing, this is always us-east-1).

Yes, it's a bit convoluted, but it seems like the way to accomplish what you are trying to do without imposing substantial latency on request processing -- Lambda@Edge hands off the information as quickly as possible to another service that will assume responsibility for actually generating the log message in the database.