Search code examples
pythoncomputer-visionyolobounding-box

Does Pytorch allow to apply given transformations to bounding box coordinates of the image?


In Pytorch, I know that certain image processing transformations can be composed as such:

import torchvision.transforms as transforms
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

In my case, each image has a corresponding annotation of bounding box coordinates with YOLO format. Does Pytorch allow to apply these transformations to the bounding box coordinates of the image as well, and later save them as new annotations? Thanks.


Solution

  • The transformations that you used as examples do not change the bounding box coordinates. ToTensor() converts a PIL image to a torch tensor and Normalize() is used to normalize the channels of the image.

    Transformations such as RandomCrop() and RandomRotation() will cause a mismatch between the location of the bounding box and the (modified) image.

    However, Pytorch makes it very flexible for you to create your own transformations and have control over what happens with the bounding box coordinates.

    Docs for more details: https://pytorch.org/docs/stable/torchvision/transforms.html#functional-transforms

    As an example (modified from the documentation):

    import torchvision.transforms.functional as TF
    import random
    
    def my_rotation(image, bonding_box_coordinate):
        if random.random() > 0.5:
            angle = random.randint(-30, 30)
            image = TF.rotate(image, angle)
            bonding_box_coordinate = TF.rotate(bonding_box_coordinate, angle)
        # more transforms ...
        return image, bonding_box_coordinate
    

    Hope that helps =)