Search code examples
tensorflowneural-networksoftmax

Can I use real probability distributions as labels for tf.nn.softmax_cross_entropy_with_logits?


In Tensorflow manual, description for labels is like below:

labels: Each row labels[i] must be a valid probability distribution.

Then, does it mean labels can be like below, if I have real probability distributions of classes for each input.

[[0.1, 0.2, 0.05, 0.007 ... ]
[0.001, 0.2, 0.5, 0.007 ... ]
[0.01, 0.0002, 0.005, 0.7 ... ]]

And, is it more efficient than one-hot encoded labels?

Thank you in advance.


Solution

  • In a word, yes, you can use probabilities as labels.

    The documentation for tf.nn.softmax_cross_entropy_with_logits says you can:

    NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

    If using exclusive labels (wherein one and only one class is true at a time), see sparse_softmax_cross_entropy_with_logits.

    Let's have a short example to be sure it works ok:

    import numpy as np
    import tensorflow as tf
    
    labels = np.array([[0.2, 0.3, 0.5], [0.1, 0.7, 0.2]])
    logits = np.array([[5.0, 7.0, 8.0], [1.0, 2.0, 4.0]])
    
    sess = tf.Session()
    ce = tf.nn.softmax_cross_entropy_with_logits(
         labels=labels, logits=logits).eval(session=sess)
    print(ce)  # [ 1.24901222  1.86984602]
    
    # manual check
    predictions = np.exp(logits)
    predictions = predictions / predictions.sum(axis=1, keepdims=True)
    ce_np = (-labels * np.log(predictions)).sum(axis=1)
    print(ce_np)  # [ 1.24901222  1.86984602]
    

    And if you have exclusive labels, it is better to use one-hot encoding and tf.nn.sparse_softmax_cross_entropy_with_logits rather than tf.nn.softmax_cross_entropy_with_logitsand explicit probability representation like [1.0, 0.0, ...]. You can have shorter representation that way.