Search code examples
pythonneural-networkpytorchspeech-recognitiondataloader

Create dataset out of x_train and y_train


How to put the x_train and y_train into a model for training?
The x_train is a tensor of size (3000, 13).
The y_train is of size (3000, 1)
That is for each element of x_train (1, 13), the respective y label is one digit from y_train. if I do:

train_data = (train_feat, train_labels)
print(train_data[0].shape)
print(train_data[1].shape)

torch.Size([3082092, 13])
torch.Size([3082092, 1])
train_loader = data.DataLoader(dataset=train_data,
                                batch_size= 7,
                                shuffle=True)

The dataloader does not return the batch size, but returns the whole dataset instead


Solution

  • You can use the TensorDataset constructor:

    import torch.utils.data as data_utils
    
    dataset = data_utils.TensorDataset(train_feat, train_labels)
    train_loader = data_utils.DataLoader(dataset, batch_size=7, shuffle=True)