I am trying to split the data into training(80%) and test(20%) set, but I need to shuffle the data first and then assign equal number of samples for each label(y, 10 classes) in each dataset.
How can I do this in lua/torch? Thanks!
This is my code so far...
loaded = torch.load(data_file, 'ascii')
Data = {
data = loaded.data,
labels = loaded.labels,
size = 60000
}
Data.data:nDimension()
4
Data.labels:nDimension()
1
Data.data:size()
60000
1
32
32
[torch.LongStorage of size 4]
validationData.labels:size()
60000
[torch.LongStorage of size 1]
You could do something like below to shuffle;
dataSize = Data.data:size()[1]
shuffleIdx = torch.randperm(dataSize)
Data.data = Data.data:index(1,shuffleIdx:long())
Data.labels = Data.labels:index(1,shuffleIdx:long())
but I am not sure about the second part of your question.