Search code examples
pytorchmultilabel-classificationmulticlass-classification

Multilabel classification using ResNet - loading sample using the dataloader takes a long time


I need to implement a ResNet-based multiclass classifier, and I am using this notebook as a starting point. At the moment, I am just going through the notebook, checking that all steps work fine. When I try to load a sample with the dataloader (command: sample = next(iter(train_loader))), I get no results even after waiting for more than an hour. Why is that?

The dataloader is defined in this cell:

#Pre-processing transformations
data_transforms = transforms.Compose([
        transforms.Resize((224,224)),
        transforms.ToTensor(),
        transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))
    ])

#Getting the data
cardata = CarDataset("./content/content/carimages/car_ims", transform=data_transforms,translation_dict=translation_dict)

#Split the data in training and testing
train_len = int(cardata.__len__()*0.8)
test_len = int(cardata.__len__()*0.2)
train_set, val_set = torch.utils.data.random_split(cardata, [train_len, test_len])

#Create the dataloader for each dataset
train_loader = DataLoader(train_set, batch_size=16, shuffle=True, 
                                num_workers=4, drop_last=True)
test_loader = DataLoader(val_set, batch_size=16, shuffle=False, 
                               num_workers=4, drop_last=True)

If I try loading the data on the GPU using

train_set.cardata.to(torch.device("cuda:0"))  # put data into GPU entirely
train_set.to(torch.device("cuda:0"))

I get the error 'Subset' object has no attribute 'cardata'. Am I doing something wrong? Is it just normal that the dataloader takes so long to load the dataset images? Thanks!


Solution

  • It seems there is no error at calling train/val set and loader of provided code. There might be an error inside of CarDataset() class, or maybe setting num_workers=0 could help.

    Also, please avoid acquiring a sample via next(iter(train_loader)). This creates a new loader every time it is called. Use train_iter = iter(train_loader) and inputs, labels = train_iter.next() instead.

    Additionally, as far as I know, putting the entire dataset to a GPU or GPUs is not supported. You can load a mini-batch data (pytorch tensor) to it: inputs = inputs.to(torch.device("cuda"))

    Any advise or correcting a wrong answer is welcomed.