python google-drive-api python-imaging-library stringio bytesio

PIL cannot identify image file for a Google Drive image streamd into io.BytesIO

I am using the Drive API to download an image. Following their file downloading documentation in Python, I end up with a variable fh that is a populated io.BytesIO instance. I try to save it as an image:

file_id = "0BwyLGoHzn5uIOHVycFZpSEwycnViUjFYQXR5Nnp6QjBrLXJR"
request = service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print('Download {} {}%.'.format(file['name'],
                                    int(status.progress() * 100)))
    fh.seek(0)
image = Image.open(fh) # error

The error is: cannot identify image file <_io.BytesIO object at 0x106cba890>. Actually, the error does not occur with another image but is thrown with most images, including the one I linked at the beginning of this post.

After reading this answer I change that last line to:

byteImg = fh.read()
dataBytesIO = io.BytesIO(byteImg)
image = Image.open(dataBytesIO) # still the same error

I've also tried this answer, where I change the last line of my first code block to

byteImg = fh.read()
image = Image.open(StringIO(byteImg))

But I still get a cannot identify image file <StringIO.StringIO instance at 0x106471e60> error.

I've tried using alternates (requests, urllib) with no fruition. I can Image.open the the image if I download it manually.

This error was not present a month ago, and has recently popped up into the application this code is in. I've spent days debugging this error with no success and have finally brought the issue to Stack Overflow. I am using from PIL import Image.

Solution

Ditch the Drive service's MediaIOBaseDownload. Instead, use the webContentLink property of a media file (a link for downloading the content of the file in a browser, only available for files with binary content). Read more here.

With that content link, we can use an alternate form of streaming—the requests and shutil libraries and the —to get the image.

import requests
import shutil

r = requests.get(file['webContentLink'], stream=True)
with open('output_file', 'wb') as f:
    shutil.copyfileobj(r.raw, f)