Search code examples
machine-learninglossloss-function

would we ever compute the cost J(θ) on the *test* set?


I'm pretty sure that the answer is no, but wanted to confirm...

When training a neural network or other learning algorithm, we will compute the cost function J(θ) as an expression of how well our algorithm fits the training data (higher values mean it fits the data less well). When training our algorithm, we generally expect to see J(theta) go down with each iteration of gradient descent.

But I'm just curious, would there ever be a value to computing J(θ) against our test data?

I think the answer is no, because since we only evaluate our test data once, we would only get one value of J(θ), and I think that it is meaningless except when compared with other values.


Solution

  • Your question touches on a very common ambiguity regarding the terminology: one between the validation and the test sets (the Wikipedia entry and this Cross Vaidated post may be helpful in resolving this).

    So, assuming that you indeed refer to the test set proper and not the validation one, then:

    1. You are right in that this set is only used once, just at the end of the whole modeling process

    2. You are, in general, not right in assuming that we don't compute the cost J(θ) in this set.

    Elaborating on (2): in fact, the only usefulness of the test set is exactly for evaluating our final model, in a set that has not been used at all in the various stages of the fitting process (notice that the validation set has been used indirectly, i.e. for model selection); and in order to evaluate it, we obviously have to compute the cost.

    I think that a possible source of confusion is that you may have in mind only classification settings (although you don't specify this in your question); true, in this case, we are usually interested in the model performance regarding a business metric (e.g. accuracy), and not regarding the optimization cost J(θ) itself. But in regression settings it may very well be the case that the optimization cost and the business metric are one and the same thing (e.g. RMSE, MSE, MAE etc). And, as I hope is clear, in such settings computing the cost in the test set is by no means meaningless, despite the fact that we don't compare it with other values (it provides an "absolute" performance metric for our final model).

    You may find this and this answers of mine useful regarding the distinction between loss & accuracy; quoting from these answers:

    Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...