Search code examples
pytorchtorch

How do you test a custom dataset in Pytorch?


I've been following tutorials in Pytorch that use datasets from Pytorch that allow you to enable whether you'd like to train using the data or not... But now I'm using a .csv and a custom dataset.

class MyDataset(Dataset):
    def __init__(self, root, n_inp):
        self.df = pd.read_csv(root)
        self.data = self.df.to_numpy()
        self.x , self.y = (torch.from_numpy(self.data[:,:n_inp]),
                           torch.from_numpy(self.data[:,n_inp:]))
    def __getitem__(self, idx):
        return self.x[idx, :], self.y[idx,:]
    def __len__(self):
        return len(self.data)

How can I tell Pytorch not to train my test_dataset so I can use it as a reference of how accurate my model is?

train_dataset = MyDataset("heart.csv", input_size)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle =True)
test_dataset = MyDataset("heart.csv", input_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle =True)

Solution

  • In pytorch, a custom dataset inherits the class Dataset. Mainly it contains two methods __len__() is to specify the length of your dataset object to iterate over and __getitem__() to return a batch of data at a time.

    Once the dataloader objects are initialized (train_loader and test_loader as specified in your code), you need to write a train loop and a test loop.

    def train(model, optimizer, loss_fn, dataloader):
        model.train()
        for i, (input, gt) in enumerate(dataloader):
            if params.use_gpu: #(If training using GPU)
                input, gt = input.cuda(non_blocking = True), gt.cuda(non_blocking = True)
            predicted = model(input)
            loss = loss_fn(predicted, gt)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
    

    and your test loop should be:

    def test(model,loss_fn, dataloader):
        model.eval()
        for i, (input, gt) in enumerate(dataloader):
            if params.use_gpu: #(If training using GPU)
                input, gt = input.cuda(non_blocking = True), gt.cuda(non_blocking = True)
            predicted = model(input)
            loss     = loss_fn(predicted, gt)
    

    In additional you can use metrics dictionary to log your predicted, loss, epochs etc,. The main difference between training and test loop is that we exclude back propagation (zero_grad(), backward(), step()) in inference stage.

    Finally,

    for epoch in range(1, epochs + 1):
        train(model, optimizer, loss_fn, train_loader)
        test(model, loss_fn, test_loader)