Search code examples
machine-learningtensorflowcross-entropy

Tensorflow - Using tf.losses.hinge_loss causing Shapes Incompatible error


My current code using sparse_softmax_cross_entropy works fine.

loss_normal = (
    tf.reduce_mean(tf.losses
                   .sparse_softmax_cross_entropy(labels=labels,
                                                 logits=logits,
                                                 weights=class_weights))
    )

However, when I try to use the hinge_loss:

loss_normal = (
    tf.reduce_mean(tf.losses
                   .hinge_loss(labels=labels,
                               logits=logits,
                               weights=class_weights))
    )

It reported an error saying:

ValueError: Shapes (1024, 2) and (1024,) are incompatible

The error seems to be originated from this function in the losses_impl.py file:

  with ops.name_scope(scope, "hinge_loss", (logits, labels)) as scope:
    ...
    logits.get_shape().assert_is_compatible_with(labels.get_shape())
    ...

I modified my code as below to just extract 1 column of the logits tensor:

loss_normal = (
    tf.reduce_mean(tf.losses
                   .hinge_loss(labels=labels,
                               logits=logits[:,1:],
                               weights=class_weights
                               ))
    )

But it still reports a similar error:

ValueError: Shapes (1024, 1) and (1024,) are incompatible.

Can someone please help point out why my code works fine with sparse_softmax_cross_entropy loss but not hinge_loss?


Solution

  • The tensor labels has the shape [1024], the tensor logits has [1024, 2] shape. This works fine for tf.nn.sparse_softmax_cross_entropy_with_logits:

    • labels: Tensor of shape [d_0, d_1, ..., d_{r-1}] (where r is rank of labels and result) and dtype int32 or int64. Each entry in labels must be an index in [0, num_classes). Other values will raise an exception when this op is run on CPU, and return NaN for corresponding loss and gradient rows on GPU.
    • logits: Unscaled log probabilities of shape [d_0, d_1, ..., d_{r-1}, num_classes] and dtype float32 or float64.

    But tf.hinge_loss requirements are different:

    • labels: The ground truth output tensor. Its shape should match the shape of logits. The values of the tensor are expected to be 0.0 or 1.0.
    • logits: The logits, a float tensor.

    You can resolve this in two ways:

    • Reshape the labels to [1024, 1] and use just one row of logits, like you did - logits[:,1:]:

      labels = tf.reshape(labels, [-1, 1])
      hinge_loss = (
          tf.reduce_mean(tf.losses.hinge_loss(labels=labels,
                                              logits=logits[:,1:],
                                              weights=class_weights))
          )
      

      I think you'll also need to reshape the class_weights the same way.

    • Use all of learned logits features via tf.reduce_sum, which will make a flat (1024,) tensor:

      logits = tf.reduce_sum(logits, axis=1)
      hinge_loss = (
          tf.reduce_mean(tf.losses.hinge_loss(labels=labels,
                                              logits=logits,
                                              weights=class_weights))
          )
      

      This way you don't need to reshape labels or class_weights.