Since the source code of tf.nn.softmax_cross_entropy_with_logits
in gen_nn_ops
is hidden, could anyone perhaps explain me how tensorflow compute the cross entropy after Softmax. I mean, after softmax it might output 0 because of precision which will give rise to a NaN problem with cross entropy. Did tensorflow use clip method when softmax to bound the output of it?
The implementation of tf.nn.softmax_cross_entropy_with_logits
further goes to native C++ code, here is XLA implementation. Logits are not bound and 0
is possible when one of the logits is much bigger than others. Example:
>>> session.run(tf.nn.softmax([10.0, 50.0, 100.0, 200.0]))
array([ 0., 0., 0., 1.], dtype=float32)
If you wish, you can clip the logits just before the softmax, but it's not recommended, because it kills the gradient when the output is large. A better option is to use batch normalization to make the activations more like normally distributed.