Search code examples
htmlamazon-web-servicesamazon-s3aws-lambdaamazon-cloudfront

Serve index file instead of download prompt


I have my website hosted on S3 with CloudFront as a CDN, and I need these two URLs to behave the same and to serve the index.html file within the directory:

example.com/directory example.com/directory/

The one with the / at the end incorrectly prompts the browser to download a zero byte file with a random hash for the name of the file. Without the slash it returns my 404 page.

How can I get both paths to deliver the index.html file within the directory?

If there's a way I'm "supposed" to do this, great! That's what I'm hoping for, but if not I'll probably try to use Lambda@Edge to do a redirect. I need that for some other situations anyway, so some instructions on how to do a 301 or 302 redirect from Lambda@Edge would be helpful too : )

Update (as per John Hanley's Comment)

curl -i https://www.example.com/directory/

HTTP/2 200 
content-type: application/x-directory
content-length: 0
date: Sat, 12 Jan 2019 22:07:47 GMT
last-modified: Wed, 31 Jan 2018 00:44:16 GMT
etag: "[id]"
accept-ranges: bytes
server: AmazonS3
x-cache: Miss from cloudfront
via: 1.1 [id].cloudfront.net (CloudFront)
x-amz-cf-id: [id]

Update

CloudFront has one behavior set, forwarding http to https and sending the requests to S3. It also has a 404 error route under the errors tab.


Solution

  • S3 only offers automatic index documents when you've enabled and are using the web site hosting features of the bucket, by pointing to the bucket's website hosting endpoint, ${bucket}.s3-website.${region}.amazonaws.com rather than the generic REST endpoint of the bucket, ${bucket}.s3.amazonaws.com.

    Web site endpoints and REST endpoints have numerous differences, including this one.

    The reason you're seeing these 0-byte files for object keys ending in / is because you are creating folder objects in the bucket using the S3 console or another utility that actually creates the 0-byte objects. They aren't needed, once the folders have objects "in" them -- but they're the only way to display an empty folder in the S3 console, which displays an object named foo/ as a folder named foo, even if there are no other objects with a key prefix of foo/. It's part of the visual emulation of a folder hierarchy in the console, even though objects in S3 are never really "in" folders.

    If for some reason you need to use the REST endpoint -- such as you don't want to make the bucket public -- then you need two Lambda@Edge triggers in CloudFront, to emulate this functionality fairly closely.

    An Origin Request trigger can inspect and modify requests after the CloudFront cache is checked, before the request is sent to the origin. We use this to check for a path ending in / and append index.html if we find that.

    An Origin Response trigger can inspect and potentially modify responses, before they are written into the CloudFront cache. The Origin Response trigger can also inspect the original request that preceded the request that generated the response. We use this to check whether the response is an error. If it is, and the original request does not appear to be for an index document or a file (specifically, after the final slash in the path, a "file" has at least one character, followed by a dot, followed by at least one more character -- and if so, that's probably a "file"). If it's neither one of those things, we redirect to the original path plus a final / that we append.

    Origin Request and Origin Response triggers fire only on cache misses. When there is a cache hit, neither trigger fires, because they are on the origin side of CloudFront -- the back side of the cache. Requests that can be served from the cache are served from the cache, so the triggers are not invoked.

    The following is a Lambda@Edge function written in Node.js 8.10. This one Lambda function modifies its behavior so that it it behaves as either origin request or origin response, depending on context. After publishing a version in Lambda, associate that version's ARN with the CloudFront Cache Behavior settings as both an Origin Request and an Origin Response trigger.

    'use strict';
    
    // combination origin-request, origin-response trigger to emulate the S3
    // website hosting index document functionality, while using the REST
    // endpoint for the bucket
    
    // https://stackoverflow.com/a/54263794/1695906
    
    const INDEX_DOCUMENT = 'index.html'; // do not prepend a slash to this value
    
    const HTTP_REDIRECT_CODE = '302'; // or use 301 or another code if desired
    const HTTP_REDIRECT_MESSAGE = 'Found'; 
    
    exports.handler = (event, context, callback) => {
        const cf = event.Records[0].cf;
    
        if(cf.config.eventType === 'origin-request')
        {
            // if path ends with '/' then append INDEX_DOCUMENT before sending to S3
            if(cf.request.uri.endsWith('/'))
            {
                cf.request.uri = cf.request.uri + INDEX_DOCUMENT;
            }
            // return control to CloudFront, to send request to S3, whether or not
            // we modified it; if we did, the modified URI will be requested.
            return callback(null, cf.request);
        }
        else if(cf.config.eventType === 'origin-response')
        {
            // is the response 403 or 404?  If not, we will return it unchanged.
            if(cf.response.status.match(/^40[34]$/))
            {
                // it's an error.
    
                // we're handling a response, but Lambda@Edge can still see the attributes of the request that generated this response; so, we
                // check whether this is a page that should be redirected with a trailing slash appended.  If it doesn't look like an index
                // document request, already, and it doesn't end in a slash, and doesn't look like a filename with an extension... we'll try that.
    
                // This is essentially what the S3 web site endpoint does if you hit a nonexistent key, so that the browser requests
                // the index with the correct relative path, except that S3 checks whether it will actually work.  We are using heuristics,
                // rather than checking the bucket, but checking is an alternative.
    
                if(!cf.request.uri.endsWith('/' + INDEX_DOCUMENT) && // not a failed request for an index document
                   !cf.request.uri.endsWith('/') && // unlikely, unless this code is modified to pass other things through on the request side
                   !cf.request.uri.match(/[^\/]+\.[^\/]+$/)) // doesn't look like a filename  with an extension
                {
                    // add the original error to the response headers, for reference/troubleshooting
                    cf.response.headers['x-redirect-reason'] = [{ key: 'X-Redirect-Reason', value: cf.response.status + ' ' + cf.response.statusDescription }];
                    // set the redirect code
                    cf.response.status = HTTP_REDIRECT_CODE;
                    cf.response.statusDescription = HTTP_REDIRECT_MESSAGE;
                    // set the Location header with the modified URI
                    // just append the '/', not the "index.html" -- the next request will trigger
                    // this function again, and it will be added without appearing in the
                    // browser's address bar.
                    cf.response.headers['location'] = [{ key: 'Location', value: cf.request.uri + '/' }];
                    // not strictly necessary, since browsers don't display it, but remove the response body with the S3 error XML in it
                    cf.response.body = '';
                }
            }
    
            // return control to CloudFront, with either the original response, or
            // the modified response, if we modified it.
    
            return callback(null, cf.response);
    
        }
        else // this is not intended as a viewer-side trigger.  Throw an exception, visible only in the Lambda CloudWatch logs and a 502 to the browser.
        {
            return callback(`Lambda function is incorrectly configured; triggered on '${cf.config.eventType}' but expected 'origin-request' or 'origin-response'`);
        }
    
    };