Search code examples
tensorflowsoftmaxcross-entropy

tensorflow softmax_cross_entropy code


Since the source code of tf.nn.softmax_cross_entropy_with_logits in gen_nn_ops is hidden, could anyone perhaps explain me how tensorflow compute the cross entropy after Softmax. I mean, after softmax it might output 0 because of precision which will give rise to a NaN problem with cross entropy. Did tensorflow use clip method when softmax to bound the output of it?


Solution

  • The implementation of tf.nn.softmax_cross_entropy_with_logits further goes to native C++ code, here is XLA implementation. Logits are not bound and 0 is possible when one of the logits is much bigger than others. Example:

    >>> session.run(tf.nn.softmax([10.0, 50.0, 100.0, 200.0]))
    array([ 0.,  0.,  0.,  1.], dtype=float32)
    

    If you wish, you can clip the logits just before the softmax, but it's not recommended, because it kills the gradient when the output is large. A better option is to use batch normalization to make the activations more like normally distributed.