I'm having some problem with my seq2seq model in some cases its work just fine but in some cases its return as a result only the end token.
For example :
For given vector :
[2, #start token
3,
123,
1548, #end token
1548,
1548,
1548,
1548,
1548,
1548]
The model predict :
[1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548,
1548]
i tried to use SaveModel callback from keras that monitor "loss" but its still giving the same result.
so i figure out that maybe i should use my own loss function.
simple loss function that keras provide :
def mean_absolute_error(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
both y_true and y_pred are tensorflow objects (we get only the pointer to the real array) so .. in order to create some logic we need to get the array from the gpu or to upload my own array to the gpu..
my wanted loss funtion
def mean_absolute_error(y_true, y_pred):
sum = 0
for y , _y in zip(y_true , y_pred):
if (y == _y) and (y == self.startToken or y == self.endToken):
continue
else:
sum += abs(y - _y)
return sum
i tried to use y_true.eval() which should bring the array as numpy object to the cpu ( Cannot evaluate tensor using eval()
: No default session is registered)
and i didnt manage to find how to upload my own array into tensorflow.
if you have a solution or any suggestion i will be more than happy to hear about it.
Thanks..
(not too importent but ...)
The model based on: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html , but with one-hot(two dim [Matrix]) output.
Using K.eval
or if
in loss functions is not a good idea. All the idea about tensors is that they have an internal connection managed by tensorflow/keras, by which it's possible to compute gradients and other things.
Using eval
and working on numpy values will break this connection and spoil the model. Use eval
only to see results, not to create funcions.
Using if
s will not work because the tensor's values are not available. But there are keras functions, such as K.switch
, K.greater
, K.less
, etc., all listed in the backend documentation.
You can recreate your function using those functions.
But honestly, I think you should go for "masking" or "class weighting" instead.
If you're using embedding layers, you can intentionally reserve zero values for "nothing after the end".
You can then use mask_zero=True
in the embedding layers and have inputs like this:
[2, #start token
3,
123,
1548, #end token
0, #nothing, value to be masked
0,
0,
0,
0,
0]
Another option is to not have an "end token" and use "zero" instead.
Since this is very probably happening because you have much more end tokens than anything else in your desired outputs, you can reduce the relevance of the end tokens.
Count each class occurences in your outputs and calculate a ratio for the end tokens. An example:
ratio = other_classes_mean / end_token_occurences
Then in the fit
method, use:
class_weight = {0:1, 1:1, 2:1, ...., 1548:ratio, 1549:1,1550:1,...}
Easily doable with:
class_weight = {i:1. for i in range(totalTokens)}
class_weight[1548] = ratio
model.fit(...,...,....., class_weight = class_weight,...)
(Make sure you have 0 as a possible class in this case, or shift the indices by 1)
Notice that y_pred
will never be "equal" to y_true
.
y_pred
is variable, continuous and differentiabley_true
is exact and constantFor a comparison, you should take "argmax", which is very similar to (if not exactly) a class index.
def mean_absolute_error(y_true, y_pred):
#for comparing, let's take exact values
y_true_max = K.argmax(y_true)
y_pred_max = K.argmax(y_pred)
#compare with a proper tensor function
equal_mask = K.equal(y_true_max,y_pred_max)
is_start = K.equal(y_true_max, self.startTokenAsIndex)
is_end = K.equal(y_true_max, self.endTokenAsIndex)
#cast to float for multiplying and summing
equal_mask = K.cast(equal_mask, K.floatx())
is_start = K.cast(is_start, K.floatx())
is_end = K.cast(is_end, K.floatx())
#these are tensors with 0 (false) and 1 (true) as float
#entire condition as you wanted
condition = (is_start + is_end) * equal_mask
# sum = or ||| multiply = and
# we don't have to worry about the sum resulting in 2
# because you will never have startToken == endToken
#reverse condition:
condition = 1 - condition
#result
return condition * K.mean(K.abs(y_pred - y_true), axis=-1)