Search code examples
pythonnumpyopencvimage-processinggoogle-cloud-vision

Encoding a Numpy Array Image to an Image type (.png etc.) to use it with the GCloud Vision API - without OpenCV


After deciding not to use OpenCV because I only use one function of it I was looking to replace the cv2.imencode() function with something else. The goal is to convert a 2D Numpy Array into a image format (like .png) to send it to the GCloud Vision API.

This is what I was using until now:

content = cv2.imencode('.png', image)[1].tostring()
image = vision.types.Image(content=content)

And now I am looking to achieve the same without using OpenCV.

Things I've found so far:

  • Vision API needs base64 encoded data
  • Imencode returns the encoded bytes for the specific image type

I think it is worth noting that my numpy array is a binary image with only 2 dimensions and the whole functions will be used in an API, so saving a png to disk and reloading it is to be avoided.


Solution

  • PNG writer in pure Python

    If you're insistent on using more or less pure python, the following function from ideasman's answer to this question is useful.

    def write_png(buf, width, height):
        """ buf: must be bytes or a bytearray in Python3.x,
            a regular string in Python2.x.
        """
        import zlib, struct
    
        # reverse the vertical line order and add null bytes at the start
        width_byte_4 = width * 4
        raw_data = b''.join(
            b'\x00' + buf[span:span + width_byte_4]
            for span in range((height - 1) * width_byte_4, -1, - width_byte_4)
        )
    
        def png_pack(png_tag, data):
            chunk_head = png_tag + data
            return (struct.pack("!I", len(data)) +
                    chunk_head +
                    struct.pack("!I", 0xFFFFFFFF & zlib.crc32(chunk_head)))
    
        return b''.join([
            b'\x89PNG\r\n\x1a\n',
            png_pack(b'IHDR', struct.pack("!2I5B", width, height, 8, 6, 0, 0, 0)),
            png_pack(b'IDAT', zlib.compress(raw_data, 9)),
            png_pack(b'IEND', b'')])
    

    Write Numpy array to PNG formatted byte literal, encode as base64

    To represent the grayscale image as an RGBA image, we will stack the matrix into 4 channels and set the alpha channel. (Supposing your 2d numpy array is called "img"). We also flip the numpy array vertically, due to the manner in which PNG coordinates work.

    import base64
    img_rgba = np.flipud(np.stack((img,)*4, axis=-1)) # flip y-axis
    img_rgba[:, :, -1] = 255 # set alpha channel (png uses byte-order)
    data = write_png(bytearray(img_rgba), img_rgba.shape[1], img_rgba.shape[0])
    data_enc = base64.b64encode(data)
    

    Test that encoding works properly

    Finally, to ensure the encoding works, we decode the base64 string and write the output to disk as "test_out.png". Check that this is the same image you started with.

    with open("test_out.png", "wb") as fb:
       fb.write(base64.decodestring(data_enc))
    

    Alternative: Just use PIL

    However, I'm assuming that you are using some library to actually read your images in the first place? (Unless you are generating them). Most libraries for reading images have support for this sort of thing. Supposing you are using PIL, you could also try the following snippet (from this answer). It just saves the file in memory, rather than on disk, and uses this to generate a base64 string.

    in_mem_file = io.BytesIO()
    img.save(in_mem_file, format = "PNG")
    # reset file pointer to start
    in_mem_file.seek(0)
    img_bytes = in_mem_file.read()
    
    base64_encoded_result_bytes = base64.b64encode(img_bytes)
    base64_encoded_result_str = base64_encoded_result_bytes.decode('ascii')