After deciding not to use OpenCV because I only use one function of it I was looking to replace the cv2.imencode()
function with something else. The goal is to convert a 2D Numpy Array into a image format (like .png) to send it to the GCloud Vision API.
This is what I was using until now:
content = cv2.imencode('.png', image)[1].tostring()
image = vision.types.Image(content=content)
And now I am looking to achieve the same without using OpenCV.
Things I've found so far:
I think it is worth noting that my numpy array is a binary image with only 2 dimensions and the whole functions will be used in an API, so saving a png to disk and reloading it is to be avoided.
If you're insistent on using more or less pure python, the following function from ideasman's answer to this question is useful.
def write_png(buf, width, height):
""" buf: must be bytes or a bytearray in Python3.x,
a regular string in Python2.x.
"""
import zlib, struct
# reverse the vertical line order and add null bytes at the start
width_byte_4 = width * 4
raw_data = b''.join(
b'\x00' + buf[span:span + width_byte_4]
for span in range((height - 1) * width_byte_4, -1, - width_byte_4)
)
def png_pack(png_tag, data):
chunk_head = png_tag + data
return (struct.pack("!I", len(data)) +
chunk_head +
struct.pack("!I", 0xFFFFFFFF & zlib.crc32(chunk_head)))
return b''.join([
b'\x89PNG\r\n\x1a\n',
png_pack(b'IHDR', struct.pack("!2I5B", width, height, 8, 6, 0, 0, 0)),
png_pack(b'IDAT', zlib.compress(raw_data, 9)),
png_pack(b'IEND', b'')])
To represent the grayscale image as an RGBA image, we will stack the matrix into 4 channels and set the alpha channel. (Supposing your 2d numpy array is called "img"). We also flip the numpy array vertically, due to the manner in which PNG coordinates work.
import base64
img_rgba = np.flipud(np.stack((img,)*4, axis=-1)) # flip y-axis
img_rgba[:, :, -1] = 255 # set alpha channel (png uses byte-order)
data = write_png(bytearray(img_rgba), img_rgba.shape[1], img_rgba.shape[0])
data_enc = base64.b64encode(data)
Finally, to ensure the encoding works, we decode the base64 string and write the output to disk as "test_out.png". Check that this is the same image you started with.
with open("test_out.png", "wb") as fb:
fb.write(base64.decodestring(data_enc))
However, I'm assuming that you are using some library to actually read your images in the first place? (Unless you are generating them). Most libraries for reading images have support for this sort of thing. Supposing you are using PIL, you could also try the following snippet (from this answer). It just saves the file in memory, rather than on disk, and uses this to generate a base64 string.
in_mem_file = io.BytesIO()
img.save(in_mem_file, format = "PNG")
# reset file pointer to start
in_mem_file.seek(0)
img_bytes = in_mem_file.read()
base64_encoded_result_bytes = base64.b64encode(img_bytes)
base64_encoded_result_str = base64_encoded_result_bytes.decode('ascii')