Search code examples
pythonimagepython-imaging-librarygoogle-cloud-vision

Pass PIL Image to google cloud vision without saving and reading


UPDATE BELOW

Is there a way to pass a PIL Image to google cloud vision?

I tried to use io.Bytes, io.String and Image.tobytes() but I always get:

Traceback (most recent call last):
  "C:\Users\...\vision_api.py", line 20, in get_text
    image = vision.Image(content)
  File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
    raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...

or this if I pass the PIL-Image directly:

TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>

This is my code:

image = Image.open(path).convert('RGB')   # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200))   # Cropping the image

vision_image = vision.Image(# I passed the different options)   # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image)   # Text detection using google-vision-api

FOR CLARITY:

I want google text detection to only analyse a certain part of an image saved on my disk. So my idea was to crop the image using PIL and then pass the cropped image to google-vision. But it is not possible to pass an PIL-Image to vision.Image, as I get the error above.

The documentation from Google.

This can be found in the vision.Image class:

Attributes:
        content (bytes):
            Image content, represented as a stream of bytes. Note: As
            with all ``bytes`` fields, protobuffers use a pure binary
            representation, whereas JSON representations use base64.

            Currently, this field only works for BatchAnnotateImages
            requests. It does not work for AsyncBatchAnnotateImages
            requests.

A working option is to save the PIL-Image as a PNG/JPG on my disk and load it using:

with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

vision_image = vision.Image(content=content)

But this is slow and seems unnecessary. And the whole point for me behind using google-vision-api is the speed comaped to open-cv.

UPDATE as of 25/9/2021

from PIL import Image
from io import BytesIO
from google.cloud import vision


with open('images/screenshots/screenshot.png', 'rb') as image_file:
    data = image_file.read()
    try:
        image = vision.Image(content=data)
        print('worked')

    except TypeError:
        print('failed')


im = Image.open('images/screenshots/screenshot.png')
buffer = BytesIO()
im.save(buffer, format='PNG')
try:
    image = vision.Image(buffer.getvalue())
    print('worked')

except TypeError:
    print('failed')

The first version works as expected, but I can't get the second one to work as @Mark Setchell recommended. The first few characters (~50) are the same, the rest is completely different.

UPDATE as of 26/9/2021

Both inputs are of type <class 'bytes'>. The complete error stack can be seen at the top of the question.

Using this code:

print(input_data[:200])
print(type(input_data))

i get the following output:

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00 \x00IDATx\x9c\xec\xbdy\xd8-\xc7Y\x1f\xf8\xab\xea>\xe7\xdb\xef\xaa\xbbk\xb3%\xcb\x8b\x16[\x12\xc6\xc8\xbb,\x1b\x03\x06\xc6\x8111\x93@2y\xc2381\x8b1\x90\x10\x9e\xf18\x93\x10\x0811\x84\x192\x0c3\x9e\x1020\x03\x03\xc3\xb0\x04\xf0C0\xc6\x96m\xc9\x96m\xed\xb2dI\x96\xaetu\xf7\xed\xdb\xcf\xe9\xae\x9a?j\xe9\xea\xbd\xba\xbb\xbaO\x9f\xef\x9e\xd7\xd6\xfd\xfat\xbf\xf5Vu-o\xbd\xf5\xeb\xb7\xde"\xef\xff\xc7\'8\x1c\x13\x07\x00\xd2\x82\xcc6\xe5\xc6\xa8B&'
<class 'bytes'>

for the working input. And:

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x01\x00\x00IDATx\x9c\xec\xbdw\x80$\xc7u\x1f\xfc\xab\xea\xeeI\x9bw/\'\x1cr\xce\x04@\x10\x04A\x82`\x84\x95%J"\x95,\xcb\x1f%\x91T\xb0$*}\x1fM\xd9\x96\x95EY\x94(\xc9\xb6\x92i+\x90\x12\x83(3)0\x82\x08$rN\x07\\\xce\xb7\xb7yBw\xd5\xf7G\x85\xaeN3\xdd=\xdd\xb3\xb3{\xfb\xc8\xc3\xceLW\xbd\xca\xaf\xde\xfb\xf5\xabW\xe4{\xdeu\x84\xa3`\xe2\x00@J\xe0Y&\xdf\x00e($\x94\x94\'p\xcc\xc3\xda\xe7Y\x0c\xf1Te\x13\xbf\xcc>\xfa:]Y=x\x84\x7f\xe8\xc23u\x1f\x91l\xfd\x99'
<class 'bytes'>

for the failing input.


Solution

  • As far as I can tell, you start off with a PIL Image and you want to obtain a PNG image in memory without going to disk. So you need this:

    #!/usr/bin/env python3
    
    from PIL import Image
    from io import BytesIO
    
    # Create PIL Image like you have - filled with red
    im = Image.new('RGB', (320,240), (255,0,0))
    
    # Create in-memory PNG - like you want for Google Cloud Vision
    buffer = BytesIO()
    im.save(buffer, format="PNG")
    
    # Look at first few bytes
    PNG = buffer.getvalue()
    print(PNG[:20])
    

    It prints this, which is exactly what you would get if you wrote the image to disk as a PNG and then read it back as binary - except this does it in memory without going to disk:

    b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'