Search code examples
amazon-web-servicesaws-lambdaamazon-cloudfront

CloudFront origin request custom S3 logic origin, is it still using edge location cached data?


I have setup a lambda@edge function that dynamically decides what bucket (origin) to use to get the s3 object from (uri..).

I want to make sure that what I am doing does not defeat the whole purpose of using CloudFront in the first place, meaning, allowing origins contents (s3 buckets) to be cached at edges so that users can get what they request fast from the closest edge location to them.

I am following this flow:

'use strict';

 const querystring = require('querystring');
 
 exports.handler = (event, context, callback) => {
     const request = event.Records[0].cf.request;
 
     /**
      * Reads query string to check if S3 origin should be used, and
      * if true, sets S3 origin properties.
      */
 
     const params = querystring.parse(request.querystring);
 
     if (params['useS3Origin']) {
         if (params['useS3Origin'] === 'true') {
             const s3DomainName = 'my-bucket.s3.amazonaws.com';
 
             /* Set S3 origin fields */
             request.origin = {
                 s3: {
                     domainName: s3DomainName,
                     region: '',
                     authMethod: 'none',
                     path: '',
                     customHeaders: {}
                 }
             };
             request.headers['host'] = [{ key: 'host', value: s3DomainName}];
         }
     }
     
    callback(null, request);
};

That is just an example of what I am doing. ( found from https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-examples.html#lambda-examples-content-based-S3-origin-request-trigger ) I have setup an Origin-Request trigger that runs my lambda@adge function which will decide what bucket to use as origin.

When the lambda function sets the bucket (s3DomainName) as it gets run in the Origin-Request trigger, as long as that bucket is setup in CloudFront as one of the origin already, will CloudFront be able to serve the file down from cache still? Or am I bypassing the whole point of using CloudFront by setting the origin bucket like shown in the cod example?


Solution

  • I want to make sure that what I am doing does not defeat the whole purpose of using CloudFront

    It doesn't defeat the purpose. This works as intended and the response would still be cached using the same rules that would be used if this bucket were a statically configured origin and no trigger was in place.

    Origin Request triggers only fire when there's a cache miss -- after the cache is checked and the requested resource is not there, and CloudFront concludes that the request needs to be forwarded to an origin server. The Origin Request trigger causes CloudFront to briefly suspend its processing while your trigger code evaluates and possibly modifies the request before returning control to CloudFront. Changing/setting the origin server in the trigger modifies the destination where CloudFront will then send the request, but it doesn't have any impact on whether the response can/will be cached or otherwise change what CloudFront would do when the response arrives from the origin server. And, if a suitable response is already available in the cache when a new request arrives, the trigger won't fire at all, since there's nothing for it to do.

    as long as that bucket is setup in CloudFront as one of the origin already

    This isn't actually required. Any S3 bucket with publicly-accessible content can be specified as the origin server in an Origin Request trigger, even if it isn't one of the origins configured on the distribution.

    There are other rules and complexities that come into play if a trigger like this is desired and your buckets require authentication and are using KMS-based object encryption, but in those cases, there would still be no impact of this design on caching -- just some extra things you need to do to actually make authentication work between CloudFront and the bucket, since an Origin Access Identity (normally used to allow CloudFront to authenticate itself to S3) isn't sufficient for this use case.

    It's not directly related to the Origin Request trigger and the gist of this question, but noteworthy since you are using request.querystring: be sure you understand the concept of the cache key, which is the combination of request attributes that CloudFront will use when determining whether a future request can be served using a given cached object. If two similar requests result in different cache keys, then CloudFront won't treat them as identical requests and the response for one cannot be used as a cached response for the other, even though your expectation might have been otherwise, since the two request are for the same object. An Origin Request trigger can't change the cache key for a request, either accidentally or deliberately, since it's already been calculated by CloudFront by the time the trigger fires.

    meaning, allowing origins contents (s3 buckets) to be cached at edges so that users can get what they request fast from the closest edge location to them

    Ah. Having written all the above, it occurs to me now that perhaps your question might actually stem from an assumption I've seen people occasionally make about how CloudFront and S3 interact. It's an assumption that is understandable, but turns out to be inaccurate. When a bucket is configured as a CloudFront origin, there is no mechanism that pushes the bucket's content out to the CloudFront edges proactively. If you were working from a belief that this is something that happens when a bucket is configured as a CloudFront S3 origin, then this question takes on a different significance, since you're essentially asking whether the content from those various buckets would still arrive at CloudFront as expected... but CloudFront is what I call a "pull-through CDN." Objects are stored in the cache at a given edge only when someone requests them through that edge -- they're cached as they are fetched as a result of that initial request, not before. So your setup simply changes where CloudFront goes to fetch a given object. The rest of the system works the same as always, with CloudFront pulling and caching the content on demand.