python-3.x numpy for-loop pytorch enumerate

ValueError: too many values to unpack while using torch tensors

For a project on neural networks, I am using Pytorch and am working with the EMNIST dataset.

The code that is already given loads in the dataset:

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

And prepares it:

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

Then, when all the configurations of the network are defined, there is a for loop to train the model per epoch:

 for i, (images, labels) in enumerate(train_loader):

In the example code this works fine.

For my task, I am given a dataset that I load as follows:

emnist = scipy.io.loadmat("DIRECTORY/emnist-letters.mat")

data = emnist ['dataset']
X_train = data ['train'][0, 0]['images'][0, 0]
y_train = data ['train'][0, 0]['labels'][0, 0]

Then, I create the train_dataset as follows:

train_dataset = np.concatenate((X_train, y_train), axis = 1)
train_dataset = torch.from_numpy(train_dataset)

And use the same step to prepare it:

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

However, when I try to use the same loop as before:

for i, (images, labels) in enumerate(train_loader):

I get the following error:

ValueError: too many values to unpack (expected 2)

Who knows what I can do so that I can train my dataset with this loop?

Solution

The dataset you created from the EMNIST data is a single tensor, and therefore, the data loader will also produce a single tensor, where the first dimension is the batch dimensions. This results in trying to unpack that tensor across the batch dimension, which doesn't work because your batch size is greater than two, but is also not what you want to happen.

You can use torch.utils.data.TensorDataset to easily create a dataset, which produces a tuple of images and their respective labels, just like the MNIST dataset does.

train_dataset = torch.utils.data.TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))