Search code examples
pythonopencvscipypngpython-imageio

Saving a 3D 32bit floatarray to a 48bit integer PNG in Python to match Kitti Ground Truth Format


Kitti has a benchmark for Optical Flow. They require the flow estimate to be 48bit PNG files to match the format of the ground truth files they have.

Ground Truth PNG Image is available for download here

Kitti have a Matlab DevKit for the estimate versus ground truth comparison.

I want to output the flow from my network as 48 bit integer PNG files, so that my flow estimates can be compared with other Kitti benchmarked flow estimates.

The numpy scaled flow file from the network is downloadable from here

However, I'm having trouble converting the float32 3D array flow to 3 channel 48bit files (16bit per channel) in python because there doesn't seem to be the support for this among image library providers, or because I am doing something wrong with my code. Can anyone help ?

I have tried a bunch of different libraries and read lots of posts.

Scipy outputs a png that is only 24bit unfortunately. Output flow estimate png generated using scipy available here

# Numpy Flow to 48bit PNG with 16bits per channel

import scipy as sp
from scipy import misc
import numpy as np
import png
import imageio
import cv2
from PIL import Image
from matplotlib import image

"""From Kitti DevKit:-

Optical flow maps are saved as 3-channel uint16 PNG images: The first 
channel
contains the u-component, the second channel the v-component and the 
third
channel denotes if the pixel is valid or not (1 if true, 0 otherwise). To 
convert
the u-/v-flow into floating point values, convert the value to float, 
subtract 2^15 and divide the result by 64.0:"""

Scaled_Flow = np.load('Scaled_Flow.npy') # This is a 32bit float
# This is the very first Kitti Test Flow Output from image_2 testing folder  
# passed through DVF
# The network that produced this flow is only trained to 51 steps, so it 
# won't provide an accurate correspondence
# But the Estimated Flow PNG should look green

ones = np.float32(np.ones((2,375,1242,1))) # Kitti devkit readme says 
that third channel is 1 if flow is valid for that pixel
# 2 for batch size, 3 for height, 3 for width, 1 for this extra layer of 
ones.
with_ones = np.concatenate((Scaled_Flow, ones), axis=3)

im = sp.misc.toimage(with_ones[-1,:,:,:], cmin=-1.0, cmax=1.0) # saves image object
im.save("Scipy_24bit.png", dtype="uint48") # Outputs 24bit only.

Flow = np.int16(with_ones) # An attempt at converting the format from 
float 32 to 16 bit integers
f512 = Flow * 512 # Kitti instructs that the flows are scaled by 512.

x = np.array(Scaled_Flow)
x.astype(np.uint16) # another attempt at converting it to unsigned 16 bit 
integers

try: # try PyPNG
    with open('PyPNGuint48bit.png', 'wb') as f:
        writer = png.Writer(width=375, height=1242, bitdepth=16)
        # Convert z to the Python list of lists expected by
        # the png writer.
        #z2list = x.reshape(-1, x.shape[1]*x.shape[2]).tolist()
        writer.write(f, x)
except:
    print("png lib approach didn't work, it might be to do with the 
sizing")

try: # try imageio
    imageio.imwrite('imageio_Flow_48bit.png', x, format='PNG-FI')
except:
    print("imageio approach didn't work, it probably couldn't handle the 
datatype")

try: # try OpenCV
    cv2.imwrite('OpenCVFlow_48bit_.png',x )
except:
    print("OpenCV approach didn't work, it probably couldn't handle the 
datatype")

try: #try: # try PIL
    im = Image.fromarray(x)
    im.save("PILLOW_Flow_48bit.png", format="PNG")
except:
    print("PILLOW approach didn't work, it probably couldn't handle the 
datatype")

try: # try Matplotlib
    image.imsave('MatplotLib_Flow_48bit.png', x)
except:
    print("Matplotlib approach didn't work, ValueError: object too deep 
for desired array")'''

I want to get a 48bit png file the same as the Kitti Ground truth, that looks green. Currently Scipy outputs a 24bit png file that is blue and white looking.


Solution

  • Here is my understanding of what you want to do:

    1. Load the data from Scaled_Flow.npy. This is a 32 bit floating point numpy array with shape (2, 375, 1242, 2).
    2. Convert Scaled_Flow[1] (an array with shape (375, 1242, 2)) to 16 bit unsigned integers by:

      • multiplying by 64,
      • adding 2**15, and
      • casting the values to np.uint16.

      That is the inverse of this description that you quoted: "To convert the u-/v-flow into floating point values, convert the value to float, subtract 2^15 and divide the result by 64.0".

    3. Increase the length of the third dimension from 2 to 3 by concatenating an array of all 1s.
    4. Save the result to a PNG file.

    Here's one way you can do that. To create the PNG file, I'll use numpngw, a library that I wrote for creating PNG and animated PNG files from numpy arrays. If you give numpngw.write_png a numpy array with data type np.uint16, it will create a PNG file with 16 bits per channel (i.e. a 48 bit image in this case).

    import numpy as np
    from numpngw import write_png
    
    
    Scaled_Flow = np.load('Scaled_Flow.npy')
    sf16 = (64*Scaled_Flow[-1] + 2**15).astype(np.uint16)
    imgdata = np.concatenate((sf16, np.ones(sf16.shape[:2] + (1,), dtype=sf16.dtype)), axis=2)
    
    write_png('sf48.png', imgdata)
    

    Here is the image that is created by that script.

    png file