I am doing a multilabel classification using some recurrent neural network structure. My question is about the loss function: my output will be vectors of true/false (1/0) values to indicate each label's class. Many resources said the Hamming loss is the appropriate objective. However, the Hamming loss has a problem in the gradient calculation: H = average (y_true XOR y_pred),the XOR cannot derive the gradient of the loss. So is there other loss functions for training multilabel classification? I've tried MSE and binary cross-entropy with individual sigmoid input.
H = average(y_true*(1-y_pred)+(1-y_true)*y_pred)
is a continuous approximation of the hamming loss.