google-chrome firefox image-processing amazon-s3 corrupt

S3-imported images occasionally corrupted: renders on Chrome, but not Firefox

One part of our app is involved in re-hosting remote images onto our own S3 bucket and displaying the results on our site. Occasionally in ~3% of cases, the image import will result in an image that Firefox believes is corrupt, resulting in an Image corrupt or truncated. error in the console. But the image renders in Chrome perfectly fine, unless in a Print Page dialog, then it exhibits the same behavior as Firefox. It seems to work on Safari in all cases.

The problem persists when I re-import the same file onto S3 or even another hosting site like imgur.com. I created a fiddle where you can test the load behavior for yourself using a known corrupted image.

https://jsfiddle.net/ysLa27bo/1/

s3 = Aws::S3::Resource.new(region: 'us-west-1')
obj = s3.bucket(MY_BUCKET_NAME).object(MY_S3_DIR_PATH)
obj.put(body: open(REMOTE_PATH_TO_IMAGE), acl: 'public-read')

Above is my S3 import code using the AWS-SDK-S3 ruby gem ran through a Sidekiq worker in a Rails 5+ environment. I should stress that this corrupted image issue is intermittent; 97% of my other imports work perfectly fine in all browsers and settings, so I don't think it's an issue with my code.

My best guess is that the image corrupts at one of two steps: at open() (reading the remote image url) or the part where we import to S3. Since the images can load perfectly fine in Chrome, I think it's more likely to be a problem at the latter step. Is it possible that Sidekiq shutting down as a result of an app deploy/restart (Heroku) might corrupt the file import somehow? Just a guess.

My questions are: Why does this happen? How do I ensure image corruption doesn't occur when I import to S3? Is there any way for me to automatically check the image's validity post-import, short of running a Firefox-driven Selenium instance?

Solution

I've been able to prevent further instances of this happening by specifically including the content-type header in my obj.put call. Specifically, that line now looks something like:

obj.put(body: open(REMOTE_PATH_TO_IMAGE), acl: 'public-read', content_type: "image/jpeg")

It's interesting to note that setting the content type for a corrupted image after it's been imported to S3 will not de-corrupt the image. So the issue isn't that S3 isn't responding with valid headers for the image, but that the corruption occurs at the time of import. I had a strong hunch at the time, but this gave me more clarity into the issue.