tensorflow loss-function softmax sigmoid

Difference in having Sigmoid activation function instead of linear activation and using sigmoid in loss

I am fairly new to the loss-functions and I have a 800 binary classification problem (meaning 800 neurons at the output that are not effected by eachother - probablity of each is 0 or 1). Now looking at the Documentations from: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

It seems that it uses "logits" which are the outputs of the network with a linear activation function and the Sigmoid (needed for the binary classification) is applied in the loss-function.

I am looking at the loss-function for the soft-max activation and similar approach is applied. I am wondering why the activation function is not added to the network outputs and the loss function receives the linear outputs (logits) and in the loss function activation is applied.

Solution

No big reason. The sigmoid is used in the loss

to save you one step elsewhere
to make sure every input to the loss is normalized thus between (0,1).

if you don't need that convenience (actually a pain for you), simply use other pre-defined loss (tf.losses.log_loss) or make one for your self. :)