Search code examples
luadeep-learningtorchmnist

Lua/Torch: Split MNIST dataset into training and test with equal number of labels in each set


I am trying to split the data into training(80%) and test(20%) set, but I need to shuffle the data first and then assign equal number of samples for each label(y, 10 classes) in each dataset.

How can I do this in lua/torch? Thanks!

This is my code so far...

loaded = torch.load(data_file, 'ascii')
Data = {
data = loaded.data,
labels = loaded.labels,
size = 60000
}



Data.data:nDimension()
4

Data.labels:nDimension()
1

Data.data:size()
 60000
 1
32
32
[torch.LongStorage of size 4]

validationData.labels:size()
 60000
[torch.LongStorage of size 1]

Solution

  • You could do something like below to shuffle;

        dataSize = Data.data:size()[1]
        shuffleIdx = torch.randperm(dataSize)
        Data.data = Data.data:index(1,shuffleIdx:long())
        Data.labels = Data.labels:index(1,shuffleIdx:long())
    

    but I am not sure about the second part of your question.