Search code examples
theanokeraskeras-layer

Keras - Theano - Test for division by zero


I have a Layer that computes the mean of timesteps and supports masking. My problem is that there may be the case that the mask is empty (no padded timesteps) but i don't know how to check for zeros when working with tensors.

I have a few training examples for which the mask is empty so i get a NaN loss and the program crashes.

This is my Layer:

class MeanOverTime(Layer):
    def __init__(self, **kwargs):
        self.supports_masking = True
        super(MeanOverTime, self).__init__(**kwargs)

    def call(self, x, mask=None):
        if mask is not None:
            return K.cast(x.sum(axis=1) / mask.sum(axis=1, keepdims=True), K.floatx()) # this may result to division by zero
        else:
            return K.mean(x, axis=1)

    def get_output_shape_for(self, input_shape):
        return input_shape[0], input_shape[-1]

    def compute_mask(self, input, input_mask=None):
        return None

This mask.sum(axis=1, keepdims=True) becomes zero. In order to bypass this i have increased the input_length so it covers all my training examples, but this is not a solution. Also i tried adding a try/except but this also didn't work.


Solution

  • try/except wont work because all this piece of code does is create the symbolic tensor graph which has no exception .. the evaluation hence the division by 0 happens in the fit/evaluate/predict function. You need to include the logic/decision in the symbolic graph.

    You can use switch(condition, then_expression, else_expression) to include if and else:

    def call(self, x, mask=None):
        if mask is not None:
            sum = mask.sum(axis=1, keepdims=True)
            cond = K.equal(sum,0)
            _the_other_tensor_ = ....
            div = K.switch(cond, _the_other_tensor_ ,sum)
            return K.cast(x.sum(axis=1) / div, K.floatx()) # this may result to division by zero
        else:
            return K.mean(x, axis=1)
    

    Or just use clip(x, min_value, max_value) to clip with a very small number epsilon to make the division numerically stable.

    def call(self, x, mask=None):
        if mask is not None:
            sum = mask.sum(axis=1, keepdims=True)
            div = K.clip(sum, K.epsilon, 1)
            return K.cast(x.sum(axis=1) / div, K.floatx()) # this may result to division by zero
        else:
            return K.mean(x, axis=1)