What is the structure of torch dataset?

I am beginning to use torch 7 and I want to make my dataset for classification. I've already made pixel images and corresponding labels. However, I do not know how to feed those data to the torch. I read some codes from others and found out that they are using the dataset whose extension is '.t7' and I think it is a tensor type. Is it right? And I wonder how I can convert my pixel images(actually, I made them with Matlab by using MNIST dataset) into t7 extension compatible to the torch. There must be structure of dataset in the t7 format but I cannot find it (also for the labels too).

To sum up, I have pixel images and labels and want to convert those to t7 format compatible to the torch.

Thanks in advance!

Solution

The datasets '.t7' are tables of labeled Tensors. For example the following lua code :

if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
Readed_t7 = torch.load('cifar10-train.t7')
print(Readed_t7)

Will return through itorch :

{
  data : ByteTensor - size: 10000x3x32x32
  label : ByteTensor - size: 10000
}

Which means the file contains a table of two ByteTensor one labeled "data" and the other one labeled "label".

To answer your question, you should first read your images (with torchx for example : https://github.com/nicholas-leonard/torchx/blob/master/README.md ) then put them in a table with your Tensor of label. The following code is just a draft to help you out. It considers the case where : there are two classes, all your images are in the same folder and are ordered through those classes.

require 'torchx';

--Read all your dataset (the chosen extension is png)
files = paths.indexdir("/Path/to/your/images/", 'png', true)
data1 = {}
for i=1,files:size() do
   local img1 = image.load(files:filename(i),3)
   table.insert(data1, img1)
end

--Create the table of label according to 
label1 = {}
for i=1, #data1 do
    if i <= number_of_images_of_the_first_class then
        label1[i] = 1
    else
        label1[i] = 2
    end
end

--Reshape the tables to Tensors
label = torch.Tensor(label1)
data = torch.Tensor(#data1,3,16,16)
for i=1, #data1 do
    data[i] = data1[i]
end

--Create the table to save
Data_to_Write = { data = data, label = label }

--Save the table in the /tmp
torch.save("/tmp/Saved_Data.t7", Data_to_Write)

It should be possible to make a less hideous code but this one details all the steps and works with torch 7 and Jupyter 5.0.0 .

Hope it helps.

Regards