Search code examples
tensorflowcross-entropysparsecategoricalcrossentropy

Regression with numerical labels Y={1,2,3} VS Classification with labels Y={[1 0 0],[0 1 0],[0 0 1]}?


Assume a deep learning problem, where there exists only one object in the image, we want to classify whether the object is either

Y={Cat:1, Dog:2, Panda:3}

Can we address this problem using neural networks in two ways:

  1. Regression Approach: Consider it a regression problem, last layer has no activation, and use loss like Minimum Squared Error (without using one-hot encoding) E.g. Labels belong to: Y={1,2,3}
  2. Classification Approach: One-hot encode labels so that: Y={[1 0 0], [0 1 0], [0 0 1]} and use Cross-Entropy Loss.

Questions are:

a) Are these two systems have the same performance?

b) Have Seen "sparse_categorical_crossentropy" in Tensorflow, does it implicitly convert labels Y={1,2,3} to Y={[1 0 0], [0 1 0], [0 0 1]} so that if I'm using "sparse_categorical_crossentropy" with labels Y={1,2,3} I should make last layer softmax layer?


Solution

  • The systems should not be equivalent, as different loss functions will lead to different gradients that are backpropagated during training. Thus, your learning will be different. However, the performance may be similar, but you need to try and see how similar or how different it is. Usually people use cross-entropy loss for these kinds of problems.

    Regarding the sparse_categorical_crossentropy in Tensorflow, according to this page, you can either provide your input as logits (no softmax) and set from_logits=True or you leave from_logits to the default value (which is False) and use softmax.