Search code examples
matlabdatasettorchmnist

What is the structure of torch dataset?


I am beginning to use torch 7 and I want to make my dataset for classification. I've already made pixel images and corresponding labels. However, I do not know how to feed those data to the torch. I read some codes from others and found out that they are using the dataset whose extension is '.t7' and I think it is a tensor type. Is it right? And I wonder how I can convert my pixel images(actually, I made them with Matlab by using MNIST dataset) into t7 extension compatible to the torch. There must be structure of dataset in the t7 format but I cannot find it (also for the labels too).

To sum up, I have pixel images and labels and want to convert those to t7 format compatible to the torch.

Thanks in advance!


Solution

  • The datasets '.t7' are tables of labeled Tensors. For example the following lua code :

    if (not paths.filep("cifar10torchsmall.zip")) then
        os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
        os.execute('unzip cifar10torchsmall.zip')
    end
    Readed_t7 = torch.load('cifar10-train.t7')
    print(Readed_t7)
    

    Will return through itorch :

    {
      data : ByteTensor - size: 10000x3x32x32
      label : ByteTensor - size: 10000
    }
    

    Which means the file contains a table of two ByteTensor one labeled "data" and the other one labeled "label".

    To answer your question, you should first read your images (with torchx for example : https://github.com/nicholas-leonard/torchx/blob/master/README.md ) then put them in a table with your Tensor of label. The following code is just a draft to help you out. It considers the case where : there are two classes, all your images are in the same folder and are ordered through those classes.

    require 'torchx';
    
    --Read all your dataset (the chosen extension is png)
    files = paths.indexdir("/Path/to/your/images/", 'png', true)
    data1 = {}
    for i=1,files:size() do
       local img1 = image.load(files:filename(i),3)
       table.insert(data1, img1)
    end
    
    --Create the table of label according to 
    label1 = {}
    for i=1, #data1 do
        if i <= number_of_images_of_the_first_class then
            label1[i] = 1
        else
            label1[i] = 2
        end
    end
    
    --Reshape the tables to Tensors
    label = torch.Tensor(label1)
    data = torch.Tensor(#data1,3,16,16)
    for i=1, #data1 do
        data[i] = data1[i]
    end
    
    --Create the table to save
    Data_to_Write = { data = data, label = label }
    
    --Save the table in the /tmp
    torch.save("/tmp/Saved_Data.t7", Data_to_Write)
    

    It should be possible to make a less hideous code but this one details all the steps and works with torch 7 and Jupyter 5.0.0 .

    Hope it helps.

    Regards