Search code examples
pythontensorflowkeraskeras-layerloss-function

Seq2Seq models and loss functions( in keras)


I'm having some problem with my seq2seq model in some cases its work just fine but in some cases its return as a result only the end token.

For example :

For given vector :
[2, #start token
3,
123,
1548, #end token
1548,
1548,
1548,
1548,
1548,
1548]

The model predict :
[1548, 
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548]

i tried to use SaveModel callback from keras that monitor "loss" but its still giving the same result.

so i figure out that maybe i should use my own loss function.

simple loss function that keras provide :

def mean_absolute_error(y_true, y_pred):
    return K.mean(K.abs(y_pred - y_true), axis=-1)

both y_true and y_pred are tensorflow objects (we get only the pointer to the real array) so .. in order to create some logic we need to get the array from the gpu or to upload my own array to the gpu..

my wanted loss funtion

def mean_absolute_error(y_true, y_pred):
    sum = 0
    for y , _y in zip(y_true , y_pred):
         if (y == _y) and (y == self.startToken or y == self.endToken):
              continue
         else:
              sum += abs(y - _y)
    return sum

i tried to use y_true.eval() which should bring the array as numpy object to the cpu ( Cannot evaluate tensor using eval(): No default session is registered)

and i didnt manage to find how to upload my own array into tensorflow.

if you have a solution or any suggestion i will be more than happy to hear about it.

Thanks..

(not too importent but ...)

The model based on: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html , but with one-hot(two dim [Matrix]) output.


Solution

  • Using K.eval or if in loss functions is not a good idea. All the idea about tensors is that they have an internal connection managed by tensorflow/keras, by which it's possible to compute gradients and other things.

    Using eval and working on numpy values will break this connection and spoil the model. Use eval only to see results, not to create funcions.

    Using ifs will not work because the tensor's values are not available. But there are keras functions, such as K.switch, K.greater, K.less, etc., all listed in the backend documentation.

    You can recreate your function using those functions.

    But honestly, I think you should go for "masking" or "class weighting" instead.

    Masking (solution 1)

    If you're using embedding layers, you can intentionally reserve zero values for "nothing after the end".

    You can then use mask_zero=True in the embedding layers and have inputs like this:

    [2, #start token
    3,
    123,
    1548, #end token
    0, #nothing, value to be masked
    0,
    0,
    0,
    0,
    0]
    

    Another option is to not have an "end token" and use "zero" instead.

    Class weighting (solution 2)

    Since this is very probably happening because you have much more end tokens than anything else in your desired outputs, you can reduce the relevance of the end tokens.

    Count each class occurences in your outputs and calculate a ratio for the end tokens. An example:

    • Calculate a mean of the occurrences of all other classes
    • Count the occurrences of end token
    • ratio = other_classes_mean / end_token_occurences

    Then in the fit method, use:

    class_weight = {0:1, 1:1, 2:1, ...., 1548:ratio, 1549:1,1550:1,...}
    

    Easily doable with:

    class_weight = {i:1. for i in range(totalTokens)}
    class_weight[1548] = ratio
    model.fit(...,...,....., class_weight = class_weight,...)
    

    (Make sure you have 0 as a possible class in this case, or shift the indices by 1)

    A similar loss function (solution 3)

    Notice that y_pred will never be "equal" to y_true.

    • y_pred is variable, continuous and differentiable
    • y_true is exact and constant

    For a comparison, you should take "argmax", which is very similar to (if not exactly) a class index.

    def mean_absolute_error(y_true, y_pred):
    
        #for comparing, let's take exact values
        y_true_max = K.argmax(y_true)
        y_pred_max = K.argmax(y_pred)
    
        #compare with a proper tensor function
        equal_mask = K.equal(y_true_max,y_pred_max)
        is_start = K.equal(y_true_max, self.startTokenAsIndex)
        is_end = K.equal(y_true_max, self.endTokenAsIndex)
    
        #cast to float for multiplying and summing
        equal_mask = K.cast(equal_mask, K.floatx()) 
        is_start = K.cast(is_start, K.floatx())
        is_end = K.cast(is_end, K.floatx())
            #these are tensors with 0 (false) and 1 (true) as float
        
        #entire condition as you wanted
        condition = (is_start + is_end) * equal_mask
            # sum = or ||| multiply = and
            # we don't have to worry about the sum resulting in 2
                # because you will never have startToken == endToken
    
        #reverse condition:
        condition = 1 - condition
    
        #result
        return condition * K.mean(K.abs(y_pred - y_true), axis=-1)