Using GPU with fastai and pytorch

I am using fastai and pytorch for image classification. I tried to train it on google colab. But it takes much time to train it on colab and I think the problem is GPU is not set properly. This is how I did it.

import os
import torch
import torchvision as tv
import matplotlib.pyplot as plt
import numpy as np
from torchinfo import summary as torchinfo_summary

#cuda configs
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#loading dataset
data_dir='/content/drive/MyDrive/tech_related/machine_learning_related/pytorch-etic/data/cat_deer_dog_horse'
#data_dir = os.path.join('..','data','cat_deer_dog_horse')
print(os.listdir(data_dir))
data_classes = os.listdir(os.path.join(data_dir,'train'))
print(data_classes)

#making torch defined dataloaders with dataset
stats = ((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  #means and stds of each channel in images of cifar10
train_tfms = tv.transforms.Compose([
    tv.transforms.RandomCrop(32,padding=4,padding_mode='reflect'),
    tv.transforms.RandomHorizontalFlip(),
    tv.transforms.ToTensor()
])
valid_tfms = tv.transforms.Compose([
    tv.transforms.ToTensor()
])

train_ds = tv.datasets.ImageFolder(os.path.join(data_dir,'train'),train_tfms)
valid_ds = tv.datasets.ImageFolder(os.path.join(data_dir,'test'),valid_tfms)
batch_size = 64
train_dl = torch.utils.data.DataLoader(train_ds,batch_size,shuffle=True,pin_memory=True)
valid_dl = torch.utils.data.DataLoader(valid_ds,batch_size,shuffle=True,pin_memory=True)

model_ = tv.models.mobilenet_v2(pretrained=False, num_classes=len(data_classes)).to(device)

#here comes the training part, finding optimum learning rate.
from fastai.vision.all import *
data = DataLoaders(train_dl,valid_dl)
learner = Learner(data, model_, loss_func=torch.nn.functional.cross_entropy, opt_func=Adam, metrics=accuracy)

lr_min,lr_steep,lr_slide,lr_valley = learner.lr_find(suggest_funcs=(minimum,steep,slide,valley))

I checked the runtime I used in colab and it is GPU. Can someone say what am I doing wrong?

Solution

It looks to me like you're using the GPU. You can confirm that you are by running next(learner.model.parameters).is_cuda immediately prior to lr_find.

What jumps out is that you are not loading pretrained weights (pretrained = False) in your model. This would naturally slow down the total time of training. Set pretrained to True and see if you get good results quicker.

Or, if you are concerned about the per-epoch time of training, the other issue might be if your datasets are very large. Large datasets are usually a good problem to have and, even if the per-epoch time of training is high, the total wall time for training without overfitting is probably similar or better with large datasets.

Just to be thorough, lr_find is not used to actually train. For that, you should use fit or fit_one_cycle with the lr_steep value generated by lr_find.