Search code examples
pythondeep-learningpytorchkaggle

Using Torchvision ImageFolder with Test Set


I am trying to solve the Dogs-vs-Cats challenge on Kaggle using the Sample Notebook that has been provided in the Udacity course. I have rearranged the files into two folder dogs/ and cats/ in the train/ directory so that the ImageFolder class can pick up the categories, but I don't know what to do in the test folder? I don't have the labels ready.

Do I just not use the ImageFolder API (seems that the course used it, so it should be usable, and obviously very convenient), or is there some option to use it when the classes are not already known. I could not find anything in this vein on the official documentation, but it should be possible seeing the course solution does it that way. Thanks for any help.

Udacity Deep-Learning with Pytorch Solution for Transfer Learning


Solution

  • Usually, in the context of training networks, the "test" set is actually a "validation" set: its a subset of the labeled examples that the model does not train on, but only being evaluated. This validation set is used for tuning meta parameters (e.g., number of epochs, learning rate, batch size etc.).
    Therefore, despite the fact that the validation ("test") set is not used for actual SGD training, you do have its labels and they are used to estimate the generalization error of the trained model.
    Since you usually do have the labels for this set, you can read it using ImageFolder class same as the training set.

    However, if you have a test set that has no labels at all, you can still use the ImageFolder class to handle the set. All you need is to create a dummy subfolder to represent a "label" for the set: ImageFolder assumes the images are stored in subfolders based on their labels.