python machine-learning computer-vision pytorch resnet

Proper dataloader setup to train fasterrcnn-resnet50 for object detection with pytorch

I am trying to train pytorches torchvision.models.detection.fasterrcnn_resnet50_fpn to detect objects in my own images.

According to the documentation, this model expects a list of images and a list of dictionaries with 'boxes' and 'labels' as keys. So my dataloaders __getitem__() looks like this:

def __getitem__(self, idx):
    # load images
    _, img = self.images[idx].getImage()
    img = Image.fromarray(img, mode='RGB')
    objects = self.images[idx].objects

    boxes = []
    labels = []
    for o in objects:
        # append bbox to boxes
        boxes.append([o.x, o.y, o.x+o.width, o.y+o.height])
        # append the 4th char of class_id, the number of lights (1-4)
        labels.append(int(str(o.class_id)[3]))

    # convert everything into a torch.Tensor
    boxes = torch.as_tensor(boxes, dtype=torch.float32)
    labels = torch.as_tensor(labels, dtype=torch.int64)

    target = {}
    target["boxes"] = boxes
    target["labels"] = labels

    # transforms consists only of transforms.Compose([transforms.ToTensor()]) for the time being
    if self.transforms is not None:
        img = self.transforms(img)

    return img, target

To my best knowledge, it returns exactly what's asked. My dataloader looks like this

data_loader = torch.utils.data.DataLoader(
    dataset, batch_size=4, shuffle=False, num_workers=2)

however, when it get's to this stage: for images, targets in dataloaders[phase]: it raises

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 12 and 7 in dimension 1 at C:\w\1\s\windows\pytorch\aten\src\TH/generic/THTensor.cpp:689

Can someone point me in the right direction?

Solution

@jodag was right, I had to write a seperate collate function in order for the net to receive the data like it was supposed to. In my case I only needed to bypass the default function.