Search code examples
pythonpytorchpytorch-dataloader

How to get a single index from a DataSet in PyTorch?


I want to randomly draw a sample from my test DataSet object to perform a prediction using my trained model.

To achieve this I use this code block which causes the following error:

rng = np.random.default_rng()
ind = rng.integers(0,len(test_ds),(1,))[-1]


I = test_ds[ind] # Note I is a list of tensors of equal size
I = [Ik.to(device) for Ik in I]

with torch.no_grad():
      _, y_f_hat, _, y_f = model.forward_F(I)
      y_f_hat = y_f_hat.cpu().numpy().flatten()
      y_f = y_f.cpu().numpy().flatten()

ERROR: /usr/local/lib/python3.8/dist-packages/torch/nn/modules/flatten.py in forward(self, input)
     44 
     45     def forward(self, input: Tensor) -> Tensor:
---> 46         return input.flatten(self.start_dim, self.end_dim)
     47 
     48     def extra_repr(self) -> str:

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

There is no problem when using the dataloader:

for I in test_dataloader:
  with torch.no_grad():
      _, y_f_hat, _, y_f = model.forward_F(I)
      y_f_hat = y_f_hat.cpu().numpy().flatten()
      y_f = y_f.cpu().numpy().flatten()
      break

test_ds is the dataset used in test_dataloader.

Notes: on google Colab GPU, Python 3.9


Solution

  • When using DataLoader, it brings the data as a batch of samples. So the shape of the data coming out of DataLoader is like (B, ...) where B is the batch size and ... are the other dimensions (I do not know how your samples look like, in terms of images, for example, it is like (B, C, H, W) where C, H, W are the number of channels, height and width, respectively). This is what pytorch layers expect. In other words, you need a preceding dimension for batch size.

    As a solution, you can call .unsqueeze(0) on input tensor before feeding into the model.

    _, y_f_hat, _, y_f = model.forward_F(I.unsqueeze(0))