Search code examples
pythonamazon-s3flaskiopython-imaging-library

Flask web server corrupts image when processing and uploading to S3


I'm trying to store images on S3 from a Flask webserver. The server receives the image and processes it to create two copies (compressed + thumbnail), then uploads all three.

The two processes imaged are received fine, but the original gets corrupted. The code doesnt throw any errors.

This is all using Python 3.6, Flask 1.0.2, Boto3 1.9.88

Below is an excerpt from the code for the upload page:

form = UploadForm()
if form.validate_on_submit():
    photo = form.photo.data

    name, ext = os.path.splitext(form.photo.data.filename)

    photo_comp, photo_thum = get_compressions(photo, 'JPEG')

    pic_set = {}

    pic_set['original'] = photo
    pic_set['compressed'] = photo_comp
    pic_set['thumbs'] = photo_thum

    for pic in pic_set:
        output = upload_to_s3(file=pic_set[pic], username=current_user.username, \
                         filetype = pic, \
                         bucket_name = current_app.config['S3_BUCKET'])

The function 'get_compressions()' produces a reduced size .jpeg of the file plus a thumbnail (apologies that the indenting formatting has come out wrong):

def get_compressions(file, filetype):
#Creates new compressed and thumbnail copies. Checks for alpha
#channel, removes if present, resaves as compressed .jpeg, then
#wraps into a Werkzeug FileStorage type.

name, ext = os.path.splitext(file.filename)

temp_compress = BytesIO()
temp_thumb = BytesIO()

image = Image.open(file)

if image.mode in ['RGBA', 'LA', 'RGBa']:
    image2 = Image.new('RGB', image.size, '#ffffff')
    image2.paste(image, None, image)
    image = image2.copy()

image.save(temp_compress, format=filetype, quality=85, optimize=True)
image.thumbnail((400,400), Image.ANTIALIAS)
image.save(temp_thumb, format=filetype, optimize=True)  

temp_thumb.seek(0)
temp_compress.seek(0)

file_comp = FileStorage(stream=temp_compress,
                        filename=name + '.' + filetype,
                        content_type='image/jpg',
                        name=file.name,
                        )

file_thum = FileStorage(stream=temp_thumb,
                        filename=name + '.' + filetype,
                        content_type='image/jpg',
                        name=file.name,
                        )

return file_comp, file_thum

Finally, the 'upload_to_s3()' function is a fairly straightforward save on an AWS S3:

def upload_to_s3(file, username, filetype, bucket_name, acl= os.environ.get('AWS_DEFAULT_ACL')):
s3.upload_fileobj(
    Fileobj=file
    , Bucket=bucket_name
    , Key = "{x}/{y}/{z}".format(x=username,y=filetype,z=file.filename)
    , ExtraArgs = {'ContentType': file.content_type}
)
print('Upload successful: ', file.filename)
return file.filename

My belief is that the compression is affecting the upload of the original file object - while the PIL image.save() returns a new object, the act of compressing appears to be affecting the original object somehow.

When trying to research this I noted that the Flask is multithreaded as standard, and that the Python GIL doesnt apply to I/O operations or image processing - not sure if this could be relevant.

Two options I'd tried to fix this was either:

  1. Changing the code order execution so it goes original upload - compression - compressed upload, but this resulted in an error 'ValueError: I/O operation on a closed file'

  2. Using copy.deepcopy() to make a new object prior to using get_compressions(), but this resulted in an 'TypeError: cannot serialize '_io.BufferedRandom' object'.

I'm not really sure how to proceed! Potentially could upload the original, then have the server process compression in the background (based on the uploaded file), but this this presents an issue for the client who wants to immediately retrieve the compressed version to load the page.


Solution

  • In your get_compressions function, you're reading the original file which is a FileStorage object, so your file pointer ends up at the end of the file and you end up writing a zero byte file to S3. So, you need to seek back to the start of the file, just like you've done for the compressed versions:

    file.seek(0)                                                                
    temp_thumb.seek(0)                                                          
    temp_compress.seek(0)