How to prepare a custom keypoints dataset for WongKinYiu/Yolov7 Pose Estimation?

How do I prepare a custom keypoints dataset for WongKinYiu/yolov7?

Solution

The keypoints format is described here

In particular this line

annotation{
    "keypoints" : [x1,y1,v1,...],
    ...
}

says that keypoints are an array x1,y1,v1,....

The official yolov7-pose github https://github.com/WongKinYiu/yolov7/tree/pose has link to download prepared COCO dataset [Keypoints Labels of MS COCO 2017] Download it, open and go to directory labels\train2017. You can open any of the txt files and you will see lines looking something like this

0 0.671279 0.617945 0.645759 0.726859 0.519751 0.381250 2.000000 0.550936 0.348438 2.000000 0.488565 0.367188 2.000000 0.642412 0.354687 2.000000 0.488565 0.395313 2.000000 0.738046 0.526563 2.000000 0.446985 0.534375 2.000000 0.846154 0.771875 2.000000 0.442827 0.812500 2.000000 0.925156 0.964063 2.000000 0.507277 0.698438 2.000000 0.702703 0.942187 2.000000 0.555094 0.950000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

This line has the following format

class top_left_x top_left_y bottom_right_x bottom_right_y kpt1_x kpt1_y kpt1_v kpt2_x kpt2_y kpt2_v ...

This is the code (from general.py) responsible for loading it


def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0, kpt_label=False):
    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
    # it does the same operation as above for the key-points
    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
    y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw  # top left x
    y[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh  # top left y
    y[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw  # bottom right x
    y[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh  # bottom right y
    if kpt_label:
        num_kpts = (x.shape[1]-4)//2
        for kpt in range(num_kpts):
            for kpt_instance in range(y.shape[0]):
                if y[kpt_instance, 2 * kpt + 4]!=0:
                    y[kpt_instance, 2*kpt+4] = w * y[kpt_instance, 2*kpt+4] + padw
                if y[kpt_instance, 2 * kpt + 1 + 4] !=0:
                    y[kpt_instance, 2*kpt+1+4] = h * y[kpt_instance, 2*kpt+1+4] + padh
    return y

which is called from

labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1], kpt_label=self.kpt_label)

Note the 1 offset in labels[:, 1:], which omits the class label. The label coordinates must be normalized as stated here

assert (l[:, 5::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'
assert (l[:, 6::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'

Getting the format of labels right is the only tricky part. The rest is to have images stored in the right directory. The structure is

images/
    train/
        file_name1.jpg
        ...
    test/
    val/
labels/
    train/
        file_name1.txt
        ...
    test/
    val/
train.txt
test.txt
val.txt

where train.txt contains paths to images. It's contents look like this

./images/train/file_name1.jpg
...