Kitti has a benchmark for Optical Flow. They require the flow estimate to be 48bit PNG files to match the format of the ground truth files they have.
Ground Truth PNG Image is available for download here
Kitti have a Matlab DevKit for the estimate versus ground truth comparison.
I want to output the flow from my network as 48 bit integer PNG files, so that my flow estimates can be compared with other Kitti benchmarked flow estimates.
The numpy scaled flow file from the network is downloadable from here
However, I'm having trouble converting the float32 3D array flow to 3 channel 48bit files (16bit per channel) in python because there doesn't seem to be the support for this among image library providers, or because I am doing something wrong with my code. Can anyone help ?
I have tried a bunch of different libraries and read lots of posts.
Scipy outputs a png that is only 24bit unfortunately. Output flow estimate png generated using scipy available here
# Numpy Flow to 48bit PNG with 16bits per channel
import scipy as sp
from scipy import misc
import numpy as np
import png
import imageio
import cv2
from PIL import Image
from matplotlib import image
"""From Kitti DevKit:-
Optical flow maps are saved as 3-channel uint16 PNG images: The first
channel
contains the u-component, the second channel the v-component and the
third
channel denotes if the pixel is valid or not (1 if true, 0 otherwise). To
convert
the u-/v-flow into floating point values, convert the value to float,
subtract 2^15 and divide the result by 64.0:"""
Scaled_Flow = np.load('Scaled_Flow.npy') # This is a 32bit float
# This is the very first Kitti Test Flow Output from image_2 testing folder
# passed through DVF
# The network that produced this flow is only trained to 51 steps, so it
# won't provide an accurate correspondence
# But the Estimated Flow PNG should look green
ones = np.float32(np.ones((2,375,1242,1))) # Kitti devkit readme says
that third channel is 1 if flow is valid for that pixel
# 2 for batch size, 3 for height, 3 for width, 1 for this extra layer of
ones.
with_ones = np.concatenate((Scaled_Flow, ones), axis=3)
im = sp.misc.toimage(with_ones[-1,:,:,:], cmin=-1.0, cmax=1.0) # saves image object
im.save("Scipy_24bit.png", dtype="uint48") # Outputs 24bit only.
Flow = np.int16(with_ones) # An attempt at converting the format from
float 32 to 16 bit integers
f512 = Flow * 512 # Kitti instructs that the flows are scaled by 512.
x = np.array(Scaled_Flow)
x.astype(np.uint16) # another attempt at converting it to unsigned 16 bit
integers
try: # try PyPNG
with open('PyPNGuint48bit.png', 'wb') as f:
writer = png.Writer(width=375, height=1242, bitdepth=16)
# Convert z to the Python list of lists expected by
# the png writer.
#z2list = x.reshape(-1, x.shape[1]*x.shape[2]).tolist()
writer.write(f, x)
except:
print("png lib approach didn't work, it might be to do with the
sizing")
try: # try imageio
imageio.imwrite('imageio_Flow_48bit.png', x, format='PNG-FI')
except:
print("imageio approach didn't work, it probably couldn't handle the
datatype")
try: # try OpenCV
cv2.imwrite('OpenCVFlow_48bit_.png',x )
except:
print("OpenCV approach didn't work, it probably couldn't handle the
datatype")
try: #try: # try PIL
im = Image.fromarray(x)
im.save("PILLOW_Flow_48bit.png", format="PNG")
except:
print("PILLOW approach didn't work, it probably couldn't handle the
datatype")
try: # try Matplotlib
image.imsave('MatplotLib_Flow_48bit.png', x)
except:
print("Matplotlib approach didn't work, ValueError: object too deep
for desired array")'''
I want to get a 48bit png file the same as the Kitti Ground truth, that looks green. Currently Scipy outputs a 24bit png file that is blue and white looking.
Here is my understanding of what you want to do:
Scaled_Flow.npy
. This is a 32 bit floating point numpy array with shape (2, 375, 1242, 2).Convert Scaled_Flow[1]
(an array with shape (375, 1242, 2)) to 16 bit unsigned integers by:
2**15
, andnp.uint16
.That is the inverse of this description that you quoted: "To convert the u-/v-flow into floating point values, convert the value to float, subtract 2^15 and divide the result by 64.0".
Here's one way you can do that. To create the PNG file, I'll use numpngw
, a library that I wrote for creating PNG and animated PNG files from numpy arrays. If you give numpngw.write_png
a numpy array with data type np.uint16
, it will create a PNG file with 16 bits per channel (i.e. a 48 bit image in this case).
import numpy as np
from numpngw import write_png
Scaled_Flow = np.load('Scaled_Flow.npy')
sf16 = (64*Scaled_Flow[-1] + 2**15).astype(np.uint16)
imgdata = np.concatenate((sf16, np.ones(sf16.shape[:2] + (1,), dtype=sf16.dtype)), axis=2)
write_png('sf48.png', imgdata)
Here is the image that is created by that script.