tensorflow machine-learning scikit-learn keras metrics

F1 score for padded outputs in Keras

I have and LSTM sequence tagger in Keras which I use for highly unbalanced data. Therefore, I'd like to use the (multiclass) F1-score as the model's main metric. I have 2 questions:

1) I use zero-padding in the data (and thus mask_zero=True in my embeddings), and all the losses are computed for masked data automatically. However, I suppose that masking has to be done manually for custom metrics computation? Is there an efficient vectorized solution for that?

2) Is it possible to pass sklearn's f1_score implementation into the model's compile (maybe after wrapping it in some way)? Right off the bat, it didn't work because apparently a placeholder was passed into it rather than a numpy array (I use tensorflow backend..)

[UPD] Given my implementation, there's now this question: I'm not sure whether there's a possibility to have the output of the model masked as well. Because if we don't care about the model's output for the 'pad' input positions (they don't contribute to the loss anyway), there may as well be some random garbage in the output which will affect the F1 metric. It would be ideal to only have there zeros as well.

Solution

Switched to the following (based on this code):

import numpy as np
from keras.callbacks import Callback
from sklearn.metrics import f1_score


class ZeroPaddedF1Score(Callback):
    def on_train_begin(self, logs={}):
        self.val_f1s = []


    def on_epoch_end(self, epoch, logs={}):
        y_true = np.argmax(self.validation_data[1], axis=-1)
        y_pred = np.argmax(self.model.predict(self.validation_data[0]), axis=-1)
        val_f1 = zero_padded_f1(y_true, y_pred)
        self.val_f1s.append(val_f1)
        print ' - val_f1: %f' % (val_f1)


def zero_padded_f1(y_true, y_pred):
    y_pred_flat, y_true_flat = [], []
    for y_pred_i, y_true_i in zip(y_pred.flatten(), y_true.flatten()):
        if y_true_i != 0:
            y_pred_flat.append(y_pred_i)
            y_true_flat.append(y_true_i)
    result = f1_score(y_true_flat, y_pred_flat, average='macro')
    return result

It won't probably work with model.compile (because it operates with numpy arrays and thus an already compiled model), but it does the job as a callback.