I am fairly new to the loss-functions and I have a 800 binary classification problem (meaning 800 neurons at the output that are not effected by eachother - probablity of each is 0 or 1). Now looking at the Documentations from: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits
It seems that it uses "logits" which are the outputs of the network with a linear activation function and the Sigmoid (needed for the binary classification) is applied in the loss-function.
I am looking at the loss-function for the soft-max activation and similar approach is applied. I am wondering why the activation function is not added to the network outputs and the loss function receives the linear outputs (logits) and in the loss function activation is applied.
No big reason. The sigmoid is used in the loss
if you don't need that convenience (actually a pain for you), simply use other pre-defined loss (tf.losses.log_loss
) or make one for your self. :)