Search code examples
herokuamazon-s3flaskboto3static-files

Serve static files in Flask from private AWS S3 bucket


I am developing a Flask app running on Heroku that allows users to upload images. The app has a page displaying the user's images in a table.

For developing purposes, I am saving the uploaded files to Heroku's ephemeral file system, and everything works fine: the images are correctly loaded and displayed (I am using the last method shown here implying the use of send_from_directory()). Now I have moved the storage to S3 and I am trying to adapt the code. I use boto3 to upload the files to the bucket: it works fine. My doubts are related to the download to populate the users' pages with their images.

As explained here, I could set the file as "public-read" and use the URL (I think this is what Flask-S3 does), but I'd rather prefer not to leave free access to the files. So, my solution attempt is to download the file to Heroku's filesystem and serve the image using again the send_from_directory() as follows:

app.py

@app.route('/download/<resource>')
def download_image(resource):
    """ resource: name of the file to download"""
    s3 = boto3.client('s3',
                      aws_access_key_id=current_app.config['S3_ACCESS_KEY'],
                      aws_secret_access_key=current_app.config['S3_SECRET_KEY'])

    s3.download_file(current_app.config['S3_BUCKET_NAME'],
                     resource,
                     os.path.join('tmp',
                                  resource))

    return send_from_directory('tmp',  # Heroku's filesystem
                               resource,
                               as_attachment=False)

Then, in the template I generate the URL for the image as follows:

...
<img src="{{ url_for('app.download_image',
                     resource=resource) }}" height="120" width="120">
...

It works, but I don't think this is the proper way for some reasons: among them, I should manage the Heroku's filesystem to avoid using up all the space between dynos restart (I should delete the images from the filesystem).

Which is the best/preferred way, also considering the performance? Thanks a lot


Solution

  • The preferred way is to simply create a pre-signed URL for the image, and return a redirect to that URL. This keeps the files private in S3, but generates a temporary, time limited, URL that can be used to download the file directly from S3. That will greatly reduce the amount of work happening on your server, as well as the amount of data transfer being consumed by your server. Something like this:

    @app.route('/download/<resource>')
    def download_image(resource):
        """ resource: name of the file to download"""
        s3 = boto3.client('s3',
                          aws_access_key_id=current_app.config['S3_ACCESS_KEY'],
                          aws_secret_access_key=current_app.config['S3_SECRET_KEY'])
    
        url = s3.generate_presigned_url('get_object', Params = {'Bucket': 'S3_BUCKET_NAME', 'Key': resource}, ExpiresIn = 100)
        return redirect(url, code=302)
    

    If you don't like that solution, you should at least look into streaming the file contents from S3 instead of writing it to the file system.