Search code examples
python-3.xnumpyindexingpytorchtorch

Get variable from a Numpy array


For a image classification problem with Pytorch, I read in my data as follows:

import scipy .io
emnist = scipy.io.loadmat(DATA_DIR + '/emnist-letters.mat')
data = emnist ['dataset']
X_train = data ['train'][0, 0]['images'][0, 0]
X_train = X_train.reshape((-1,28,28), order='F')

y_train = data ['train'][0, 0]['labels'][0, 0]

X_test = data ['test'][0, 0]['images'][0, 0]
X_test = X_test.reshape((-1,28,28), order = 'F')

y_test = data ['test'][0, 0]['labels'][0, 0]

I aim to create a dataset, using:

train_dataset = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))

Currently, when I run an instance of y_train, the output is an array:

y_train[0]
>>> array([23], dtype=uint8)

However, I want train_dataset to contain only the number that's inside the array at the 0th index (in this case 23) instead of the entire array.

How can I change my code so that the TensorDataset that is created contains only the first element of the array of y_train, instead of the entire array?


Solution

  • You can use np.squeeze() to get rid of dimensions of the data with size 1. If you mean to remove a specific dimension, pass it to squeeze.

    import numpy as np
    
    arr = np.random.randn(1, 2, 1, 3, 1)
    arr.squeeze().shape # (2, 3)
    
    arr.squeeze(2).shape # (1, 2, 3, 1)