Search code examples
machine-learningdeep-learningtrain-test-splittest-data

What is the difference between test labels and validation labels in machine learning?


I have a question with regards to the training and validation of a dataset.

I understand the concept of labels for training data i.e. y_train. What I don't get is that why should our testing/validation samples have labels as well. I assume that by giving labels to the test samples, we define what they are before putting them through the algorithm right?

Let me put it this way, if I have a dataset of pictures of dogs and cats, and I label them 1 and 2, respectively. Then if I want to throw a picture (dog) to test my model, which was not in my training dataset, why should I label it? If I label it 1, then I'm telling beforehand that it's a dog and if I label it 2, then it is a cat already.

Can I have a testing/validation dataset without label?


Solution

  • Validation dataset is used to finetune the parameters in your model while the test set is used to check the accuracy. Without the label how can claim the correctness of your model. This concept is valid in supervised learning so one needs to have labels with testing and validation dataset.