Search code examples
pythonmachine-learningkeraslstmloss-function

Does Keras ignore labels of masked values?


I'm implementing a LSTM model with Keras. I padded my sequences to a certain length to feed the dataset in the right way into the model.

At the moment, my model is the following:

model = tf.keras.Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(LSTM(units=100, return_sequences=True, input_shape=(timesteps, features)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Does Keras automatically skip labels of masked values in the loss function?


Solution

  • Yes, if your model utilizes masking then the objective function (i.e. loss function) would be automatically augmented to support masking and therefore ignoring masked samples/timesteps in calculation of loss. Actually, weighted_masked_objective is the function which does this under the hood:

    def weighted_masked_objective(fn):
        """Adds support for masking and sample-weighting to an objective function.
        It transforms an objective function `fn(y_true, y_pred)`
        into a sample-weighted, cost-masked objective function
        `fn(y_true, y_pred, weights, mask)`.
        # Arguments
            fn: The objective function to wrap,
                with signature `fn(y_true, y_pred)`.
        # Returns
            A function with signature `fn(y_true, y_pred, weights, mask)`.
        """
        if fn is None:
            return None
    
        def weighted(y_true, y_pred, weights, mask=None):
            """Wrapper function.
            # Arguments
                y_true: `y_true` argument of `fn`.
                y_pred: `y_pred` argument of `fn`.
                weights: Weights tensor.
                mask: Mask tensor.
            # Returns
                Scalar tensor.
            """
            # score_array has ndim >= 2
            score_array = fn(y_true, y_pred)
            if mask is not None:
                # Cast the mask to floatX to avoid float64 upcasting in Theano
                mask = K.cast(mask, K.floatx())
                # mask should have the same shape as score_array
                score_array *= mask
                #  the loss per batch should be proportional
                #  to the number of unmasked samples.
                score_array /= K.mean(mask) + K.epsilon()
    
            # apply sample weighting
            if weights is not None:
                # reduce score_array to same ndim as weight array
                ndim = K.ndim(score_array)
                weight_ndim = K.ndim(weights)
                score_array = K.mean(score_array,
                                     axis=list(range(weight_ndim, ndim)))
                score_array *= weights
                score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
            return K.mean(score_array)
        return weighted