Search code examples
pythonmachine-learningpytorchpytorch-lightningpytorch-dataloader

PyTorch Dataloader - list indices must be integers or slices, not list


I have implemented a COCO dataset as follows:

from torch.utils.data import Dataset
from detr.datasets.coco import CocoDetection


class MyCoco(CocoDetection):
    def __init__(self,
                 img_folder,
                 ann_file,
                 transform=None) -> None:

        super().__init__(img_folder, ann_file, transform, return_masks=True)

    def __getitem__(self, idx):
        img, target = super(MyCoco, self).__getitem__(idx)
        return img, target

Then I defined a batch sampler and dataloader as follows:

my_coco = MyCoco(
        settings.datasets.img_folder,
        settings.datasets.ann_file
)

sampler_train = torch.utils.data.RandomSampler(my_coco)
batch_sampler_train = torch.utils.data.BatchSampler(sampler_train, 
                                                    batch_size=32, 
                                                    drop_last=True)

data_loader_train = DataLoader(my_coco, 
                               sampler=batch_sampler_train,
                               collate_fn=collate_fn, 
                               num_workers=1)

When I try to iterate the loader there is an error:

for a in data_loader_train:
    print(a)
    break

TypeError: list indices must be integers or slices, not list

Looking into the functions themselves, for some reason the indexes are within another list, and i dont understand why, and more importantly, how to how to fix it:

enter image description here


Solution

  • The DataLoader class of PyTorch has many arguments, some of them are incompatible.

    As you have already defined the batch sampler, you need to pass it as batch_sampler instead of sampler. The sampler expects to find indices, not batches of indices.

    Then, the code would be:

    data_loader_train = DataLoader(my_coco, 
                                   batch_sampler=batch_sampler_train,
                                   collate_fn=collate_fn, 
                                   num_workers=1)
    

    Another way to do it is passing the sampler_train sampler and the batch size as arguments of the DataLoader:

    data_loader_train = DataLoader(my_coco, 
                                   sampler=sampler_train,
                                   batch_size=32,
                                   drop_last=True,
                                   collate_fn=collate_fn, 
                                   num_workers=1)