Search code examples
amazon-web-servicescache-controlamazon-cloudfront

AWS CloudFront - how to set html files?


I'm new to AWS CloudFront. I have a simple question that I can't seem to figure out.

I have a dynamic site, which is really just a CMS that allows editors to enter articles. Later, the CMS produces static html files (we're using boost module under drupal 6).

So what I can't figure out is how to set the cache header for the html files to achieve this outcome:

I want cloudfront to keep the html files for at least an hour, but it must make sure that the file is not modified. If the file is modified it must get it not later than 5 minutes.

Am I making sense?

What I've come up with is:

<FilesMatch "\.((html)|((html)\.gz))$">
        ExpiresByType text/html A300
        Header append Cache-Control "must-revalidate"        
  </FilesMatch>

Will it keep the files after 5 minutes if no change was made?

I've been having trouble testing myself, because i'm getting different results in different browsers.


Solution

  • The first thing to understand is that CloudFront is a cache. Caches will not check to see if the file has changed. They will simply continue serving whatever is cached until that cache expires.

    You appear to be posting code that tells your Apache web server how to serve files out, but this is completely unrelated to Amazon CloudFront. CloudFront (being a cache itself) has its own cache settings, and doesn't follow Apache's.

    The default duration of cached data is 24 hours. You can configure CloudFront to cache for as little as 1 hour (or perhaps 1 minute nowadays... I don't recall off-hand). If you need the cache to expire sooner than that, you can request an "invalidation" via the AWS Console or the web service API (you didn't specify how you're interacting with CloudFront).

    If you want the cache to invalidate, then you will need to have your Drupal module sent CloudFront an invalidation request whenever it produces new HTML. CloudFront will not check on its own (again, because caches don't do that).

    From personal experience, it typically takes CloudFront anywhere from 3-15 minutes to clear out all of the cache servers it has running all over the world so that it can pull your fresh content.

    Does this make sense?