Search code examples
tensorflowloss-functionsoftmaxsigmoid

Difference in having Sigmoid activation function instead of linear activation and using sigmoid in loss


I am fairly new to the loss-functions and I have a 800 binary classification problem (meaning 800 neurons at the output that are not effected by eachother - probablity of each is 0 or 1). Now looking at the Documentations from: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

It seems that it uses "logits" which are the outputs of the network with a linear activation function and the Sigmoid (needed for the binary classification) is applied in the loss-function.

I am looking at the loss-function for the soft-max activation and similar approach is applied. I am wondering why the activation function is not added to the network outputs and the loss function receives the linear outputs (logits) and in the loss function activation is applied.


Solution

  • No big reason. The sigmoid is used in the loss

    • to save you one step elsewhere
    • to make sure every input to the loss is normalized thus between (0,1).

    if you don't need that convenience (actually a pain for you), simply use other pre-defined loss (tf.losses.log_loss) or make one for your self. :)