Search code examples
machine-learningneural-networkgradient-descenthamming-distancemultilabel-classification

Gradient calculation in Hamming loss for multi-label classification


I am doing a multilabel classification using some recurrent neural network structure. My question is about the loss function: my output will be vectors of true/false (1/0) values to indicate each label's class. Many resources said the Hamming loss is the appropriate objective. However, the Hamming loss has a problem in the gradient calculation: H = average (y_true XOR y_pred),the XOR cannot derive the gradient of the loss. So is there other loss functions for training multilabel classification? I've tried MSE and binary cross-entropy with individual sigmoid input.


Solution

  • H = average(y_true*(1-y_pred)+(1-y_true)*y_pred)

    is a continuous approximation of the hamming loss.