Search code examples
pythondjangoamazon-s3django-storage

default_storage.exists extremely slow and frequently times out


We have just migrated a Django project to Heroku and put all of our media and static files on Amazon S3 (using django-storages and s3-boto).

Despite everything I've heard about Amazon S3 being very fast, and turning up very few results on slow performance, our image loading has slowed to an absolute crawl and frequently times out. An example of the code that is timing out is this property on one of our models, which tries to select an appropriate image, ultimately falling back to returning None:

@property
def photo(self):
    """Transparently serve the best available image for templates"""
    if self.model_shot.storage.exists(self.model_shot.name):
        return self.model_shot
    elif self.image.storage.exists(self.image.name):
        return self.image
    else:
        return None

When I tested on a model which was causing problems, I tried this:

$ heroku run python manage.py shell
...
>>> design = Design.objects.get(pk=10210)
>>> design.photo

This command caused the shell to hang for several seconds before finally returning an ImageFieldFile object. Subsequent calls to it returned instantaneously, which makes me believe the result is cached.

My question is, what is the best way to handle this? I have heard a lot about using CloudFront in situations like this, but this is definitely not due to high traffic (shouldn't have basically any traffic on our site yet). Some other caching framework? Something else entirely?

Most of the images in question are 1000x1000 px at least.


Solution

  • Switching to CloudFront completely solved this issue, and was relatively easy (no code changes just more monkeying around with the Amazon Console), so I decided to answer my own question.

    tl;dr Do not serve files directly from S3; set up CloudFront.


    Serving an S3 Bucket via CloudFront

    Step 0: If you haven't already, make sure your bucket name complies with the "best practices" for naming buckets. They don't necessarily make this obvious in all the places they should, but a bad bucket name can completely break its interoperability with other Amazon Web Services. The best thing to do is name your bucket something all lowercase that's not too long (<= 60 characters or so).

    Step 1: In order to get CloudFront to serve files from your bucket, you need to set it up as if to serve a static website. You can do this on the Amazon AWS console from your bucket's Permissions tab. Amazon has several places where there are instructions/documentation for this; IMO the clearest are these. IMPORTANT: Make sure you set up the Default Root Object to index.html -- that file doesn't even have to exist, but that setting does.

    Step 1.5 [possibly optional]: Make sure the permissions on your bucket are correct. Even though I was serving files from S3 no problem, changing to CloudFront to serve them turned everything into a 403: Access Forbidden error. If in doubt, and your files are not sensitive, you can right click on folders of your bucket in the AWS Console and click Make Public. WARNING: This can be a very time intensive process, and for some stupid reason (even though it's server side) your browser session has to stay open. Do this first and don't close your session. For our bucket, this took about 16 hours. :/

    Step 2: Go to the Amazon CloudFront section in the AWS Console and click the Create Distribution button. Make it a web distribution (default) and use the domain you generated by setting your bucket up for static web distribution in the previous step as the origin. Again, IMO, these are the clearest and most straightforward instructions in the AWS docs. You can leave just about everything default here. Once it's created, just wait until it's listed on the console as "Deployed".

    Step 3: Configure your app to serve from CloudFront rather than S3. This is the easiest part because the URLs are transparently moved from https://bucketname.s3.amazonaws.com/path to https://somerandomstring.cloudfront.net/path (bonus: you can set up the latter as a CNAME record to point to something like media.yourdomain.tld; we didn't do this so I won't go into it here). Since I'm using Django with a combination of django-storages and s3-boto, this ended up being a simple matter of setting up that Cloudfront domain in settings.py:

    AWS_S3_CUSTOM_DOMAIN = 'd2ynhpzeiwwiom.cloudfront.net'
    

    And that's it! With these changes, all of our speed woes went away, and our media-rich pages (6-20 MP worth of images per page) suddenly load faster than ever!