Search code examples
amazon-web-servicesamazon-s3cdnamazon-cloudfront

Is there a way to configure Amazon Cloudfront to delay the time before my S3 object reaches clients by specifying a release date?


I would like to upload content to S3 and but schedule a time at which Cloudfront delivers it to clients rather than immediately vending it to clients upon processing. Is there a configuration option to accomplish this?

EDIT: This time should be able to differ per object in S3.


Solution

  • There is something of a configuration option to allow this, and it does allow you to restrict specific files -- or path prefixes -- from being served up prior to a given date and time... though it's slightly... well, I don't even know what derogatory term to use to describe it. :) But it's the only thing I can come up with that uses entirely built-in functionality.

    First, a quick reminder, that public/unauthenticated read access to objects in S3 can be granted at the bucket level with bucket policies, or at the object level, using "make everything public" when uploading the object in the console, or sending x-amz-acl: public-read when uploading via the API. If either or both of these is present, the object is publicly readable, except in the face of any policy denying the same access. Deny always wins over Allow.

    So, we can create a bucket policy statement matching a specific file or prefix, denying access prior to a certain date and time.

    {
        "Version": "2012-10-17",
        "Id": "Policy1445197123468",
        "Statement": [
            {
                "Sid": "Stmt1445197117172",
                "Effect": "Deny",
                "Principal": "*",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::example-bucket/hello.txt",
                "Condition": {
                    "DateLessThan": {
                        "aws:CurrentTime": "2015-10-18T15:55:00.000-0400"
                    }
                }
            }
        ]
    }
    

    Using a wildcard would allow everything under a specific path to be subject to the same restriction.

    "Resource": "arn:aws:s3:::example-bucket/cant/see/these/yet/*",
    

    This works, even if the object is public.

    This example blocks all GET requests for matching objects by anybody, regardless of permissions they may have. Signed URLs, etc., are not sufficient to override this policy.

    The policy statement is checked for validity when it is created; however, the object being matched does not have to exist, yet, so if the policy is created before the object, that doesn't make the policy invalid.

    Live test:

    Before the expiration time: (unrelated request/response headers removed for clarity)

    $ curl -v example-bucket.s3.amazonaws.com/hello.txt
    > GET /hello.txt HTTP/1.1
    > Host: example-bucket.s3.amazonaws.com
    > Accept: */*
    >
    < HTTP/1.1 403 Forbidden
    < Content-Type: application/xml
    < Transfer-Encoding: chunked
    < Date: Sun, 18 Oct 2015 19:54:55 GMT
    < Server: AmazonS3
    <
    <?xml version="1.0" encoding="UTF-8"?>
    * Connection #0 to host example-bucket.s3.amazonaws.com left intact
    <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>AAAABBBBCCCCDDDD</RequestId><HostId>g0bbl3dyg00kbunc4Ofl1n3n0iz3h3rehahahasqlbot1337kenqweqwel24234kj41l1ke</HostId></Error>
    

    After the specified date and time:

    $ curl -v example-bucket.s3.amazonaws.com/hello.txt
    > GET /hello.txt HTTP/1.1
    > Host: example-bucket.s3.amazonaws.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Date: Sun, 18 Oct 2015 19:55:05 GMT
    < Last-Modified: Sun, 18 Oct 2015 19:36:17 GMT
    < ETag: "78016cea74c298162366b9f86bfc3b16"
    < Accept-Ranges: bytes
    < Content-Type: text/plain
    < Content-Length: 15
    < Server: AmazonS3
    <
    Hello, world!
    

    These tests were done against the S3 REST endpoint for the bucket, but the website endpoint for the same bucket yields the same results -- only the error message is in HTML rather than XML.

    The positive aspect of this policy is that since the object is public, the policy can be removed any time after the date passes, because it is denying access before a certain time, rather than allowing access after a certain time -- logically the same, but implemented differently. (If the policy allowed access after rather than denying access before, the policy would have to stick around indefinitely; this way, it can just be deleted.)

    You could use custom error documents in either S3 or CloudFront to present the viewer with a slightly nicer output... probably CloudFront, since you can select customize each error code individually, creating a custom 403 page.

    The major drawbacks to this approach are, of course, that the policy must be edited for each object or path prefix and even though it works per-object, it's not something that's set per object.

    And there is a limit to how many policy statements you can include, because of the size restriction on bucket policies:

    Note

    Bucket policies are limited to 20 KB in size.

    http://docs.aws.amazon.com/AmazonS3/latest/dev/access-policy-language-overview.html


    The other solution that comes to mind involves deploying a reverse proxy component (such as HAProxy) in EC2 between CloudFront and the bucket, passing the requests through and reading the custom metadata from the object's response headers, looking of a header such as x-amz-meta-embargo-until: 2015-10-18T19:55:00Z and comparing its value to the system clock; if the current time is before the cutoff time, the proxy would drop the connection from S3 and replace the response headers and body with a locally-generated 403 message, so the client would not be able to fetch the object until the designated time had passed.

    This solution seems fairly straightforward to implement, but requires a non-built-in component, so it doesn't meet the constraint of the question and I haven't built a proof of concept; however, I already use HAProxy with Lua in front of some buckets to give S3 some other capabilities not offered natively, such as removing sensitive custom metadata from responses and modifying, and directing the browser to apply an XSL stylesheet to, the XML on S3 error responses, so there's no obvious reason that comes to mind why this application wouldn't work equally well.