Search code examples
pytorchyolopose-estimationkeypoint

How to prepare a custom keypoints dataset for WongKinYiu/Yolov7 Pose Estimation?


How do I prepare a custom keypoints dataset for WongKinYiu/yolov7?


Solution

  • The keypoints format is described here

    https://cocodataset.org/#format-data

    In particular this line

    annotation{
        "keypoints" : [x1,y1,v1,...],
        ...
    }
    

    says that keypoints are an array x1,y1,v1,....

    The official yolov7-pose github https://github.com/WongKinYiu/yolov7/tree/pose has link to download prepared COCO dataset [Keypoints Labels of MS COCO 2017] Download it, open and go to directory labels\train2017. You can open any of the txt files and you will see lines looking something like this

    0 0.671279 0.617945 0.645759 0.726859 0.519751 0.381250 2.000000 0.550936 0.348438 2.000000 0.488565 0.367188 2.000000 0.642412 0.354687 2.000000 0.488565 0.395313 2.000000 0.738046 0.526563 2.000000 0.446985 0.534375 2.000000 0.846154 0.771875 2.000000 0.442827 0.812500 2.000000 0.925156 0.964063 2.000000 0.507277 0.698438 2.000000 0.702703 0.942187 2.000000 0.555094 0.950000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
    

    This line has the following format

    class top_left_x top_left_y bottom_right_x bottom_right_y kpt1_x kpt1_y kpt1_v kpt2_x kpt2_y kpt2_v ...
    

    This is the code (from general.py) responsible for loading it

    
    def xywhn2xyxy(x, w=640, h=640, padw=0, padh=0, kpt_label=False):
        # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
        # it does the same operation as above for the key-points
        y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)
        y[:, 0] = w * (x[:, 0] - x[:, 2] / 2) + padw  # top left x
        y[:, 1] = h * (x[:, 1] - x[:, 3] / 2) + padh  # top left y
        y[:, 2] = w * (x[:, 0] + x[:, 2] / 2) + padw  # bottom right x
        y[:, 3] = h * (x[:, 1] + x[:, 3] / 2) + padh  # bottom right y
        if kpt_label:
            num_kpts = (x.shape[1]-4)//2
            for kpt in range(num_kpts):
                for kpt_instance in range(y.shape[0]):
                    if y[kpt_instance, 2 * kpt + 4]!=0:
                        y[kpt_instance, 2*kpt+4] = w * y[kpt_instance, 2*kpt+4] + padw
                    if y[kpt_instance, 2 * kpt + 1 + 4] !=0:
                        y[kpt_instance, 2*kpt+1+4] = h * y[kpt_instance, 2*kpt+1+4] + padh
        return y
    

    which is called from

    labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1], kpt_label=self.kpt_label)
    

    Note the 1 offset in labels[:, 1:], which omits the class label. The label coordinates must be normalized as stated here

    assert (l[:, 5::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'
    assert (l[:, 6::3] <= 1).all(), 'non-normalized or out of bounds coordinate labels'                            
    

    Getting the format of labels right is the only tricky part. The rest is to have images stored in the right directory. The structure is

    images/
        train/
            file_name1.jpg
            ...
        test/
        val/
    labels/
        train/
            file_name1.txt
            ...
        test/
        val/
    train.txt
    test.txt
    val.txt
    

    where train.txt contains paths to images. It's contents look like this

    ./images/train/file_name1.jpg
    ...