python machine-learning label object-detection yolo

convert Kitti labels to Yolo

Trying to convert Kitti label format to Yolo. But after converting the bbox is misplaced. this is kitti bounding box

This is conversion code:

def convertToYoloBBox(bbox, size):
# Yolo uses bounding bbox coordinates and size relative to the image size.
# This is taken from https://pjreddie.com/media/files/voc_label.py .
dw = 1. / size[0]
dh = 1. / size[1]
x = (bbox[0] + bbox[1]) / 2.0
y = (bbox[2] + bbox[3]) / 2.0
w = bbox[1] - bbox[0]
h = bbox[3] - bbox[2]
x = x * dw
w = w * dw
y = y * dh
h = h * dh
return (x, y, w, h)


convert =convertToYoloBBox([kitti_bbox[0],kitti_bbox[1],kitti_bbox[2],kitti_bbox[3]],image.shape[:2])

The function does some normalization which is essential for yolo and outputs following:

(0.14763590391908976, 0.3397063758389261, 0.20452591656131477, 0.01810402684563757)

but when i try to check if the normalization is being done correctly with this code:

x = int(convert[0] * image.shape[0])
y = int(convert[1] * image.shape[1])
width = x+int(convert[2] * image.shape[0]) 
height = y+ int(convert[3] * image.shape[1])

cv.rectangle(image, (int(x), int(y)), (int(width), int(height)), (255,0,0), 2 )

the bounding box is misplaced:

Any suggestions ? Is conversion fucntion correct? or the problem is in the checking code ?

Solution

You got the centroid calculation wrong.

Kitti labels are given in the order of left, top, right, and bottom.

to get the centroid you have to do (left + right)/ 2 and (top + bottom)/2

so your code will become

x = (bbox[0] + bbox[2]) / 2.0

y = (bbox[1] + bbox[3]) / 2.0

w = bbox[2] - bbox[0]

h = bbox[3] - bbox[1]