tensorflow keras deep-learning loss-function loss

Higher loss penalty for true non-zero predictions

I am building a deep regression network (CNN) to predict a (1000,1) target vector from images (7,11). The target usually consists of about 90 % zeros and only 10 % non-zero values. The distribution of (non-) zero values in the targets vary from sample to sample (i.e. there is no global class imbalance).

Using mean sqaured error loss, this led to the network predicting only zeros, which I don't find surprising.

My best guess is to write a custom loss function that penalizes errors regarding non-zero values more than the prediction of zero-values.

I have tried this loss function with the intend to implement what I have guessed could work above. It is a mean squared error loss in which the predictions of non-zero targets are penalized less (w=0.1).

def my_loss(y_true, y_pred):
    # weights true zero predictions less than true nonzero predictions
    w = 0.1
    y_pred_of_nonzeros = tf.where(tf.equal(y_true, 0), y_pred-y_pred, y_pred)
    return K.mean(K.square(y_true-y_pred_of_nonzeros)) + K.mean(K.square(y_true-y_pred))*w

The network is able to learn without getting stuck with only-zero predictions. However, this solution seems quite unclean. Is there a better way to deal with this type of problem? Any advice on improving the custom loss function? Any suggestions are welcome, thank you in advance!

Best, Lukas

Solution

Not sure there is anything better than a custom loss just like you did, but there is a cleaner way:

def weightedLoss(w):

    def loss(true, pred):

        error = K.square(true - pred)
        error = K.switch(K.equal(true, 0), w * error , error)

        return error 

    return loss

You may also return K.mean(error), but without mean you can still profit from other Keras options like adding sample weights and other things.

Select the weight when compiling:

model.compile(loss = weightedLoss(0.1), ...)

If you have the entire data in an array, you can do:

w = K.mean(y_train)
w = w / (1 - w) #this line compesates the lack of the 90% weights for class 1

Another solution that can avoid using a custom loss, but requires changes in the data and the model is:

Transform your y into a 2-class problem for each output. Shape = (batch, originalClasses, 2).

For the zero values, make the first of the two classes = 1
For the one values, make the second of the two classes = 1

newY = np.stack([1-oldY, oldY], axis=-1)

Adjust the model to output this new shape.

...
model.add(Dense(2*classes))
model.add(Reshape((classes,2)))
model.add(Activation('softmax'))

Make sure you are using a softmax and a categorical_crossentropy as loss.

Then use the argument class_weight={0: w, 1: 1} in fit.