python keras tensorflow2.0 attention-model

Am I using tf.math.reduce_sum in the attention model in the right way?

I was trying to use the attention model described here in a simple bidirectional lstm model. However, after adding the attention model, I got this error:

ValueError: Unknown initializer: GlorotUniform

To begin with, my code didn't have any incompatibility issue in terms of using TensorFlow in some part and Keras in other parts of the code. I also tried every solution addressed in this post. However, none of them worked for me. I must mention that my code worked with no issues before adding the attention model. So, I tried removing every line of the attention part of the network structure to see what line is causing this problem:

inputs = tf.keras.layers.Input(shape=(n_timesteps, n_features))
units = 50
activations = tf.keras.layers.Bidirectional(tf.compat.v1.keras.layers.CuDNNLSTM(units,
                                                                       return_sequences=True), 
                                                                       merge_mode='concat')(inputs)
print(np.shape(activations))

# Implementation of attention
x1 = tf.keras.layers.Dense(1, activation='tanh')(activations)
print(np.shape(x1))
x1= tf.keras.layers.Flatten()(x1)
print(np.shape(x1))
x1= tf.keras.layers.Activation('softmax')(x1)
print(np.shape(x1))
x1=tf.keras.layers.RepeatVector(units*2)(x1)  
print(np.shape(x1))
x1 = tf.keras.layers.Permute([2,1])(x1)
print(np.shape(x1))
sent_representation = tf.keras.layers.Multiply()([activations, x1])
print(np.shape(sent_representation))
sent_representation = tf.keras.layers.Lambda(lambda xin:tf.keras.backend.sum(xin, axis=-2),                                    
output_shape=(units*2,))(sent_representation)

# softmax for classification
x = tf.keras.layers.Dense(n_outputs, activation='softmax')(sent_representation)
model = tf.keras.models.Model(inputs=inputs, outputs=x)

I realized it is the line with lambda function and tf.keras.backend.sum that is causing the error. So, after some search I decided to replace that line with the following:

sent_representation = tf.math.reduce_sum(sent_representation, axis=-2)

Now, my code works. However, I am not quite sure if this substitution is correct. Am I doing this right?

Edit: Here is the next lines of the code, the problem is caused when I try to load the best model for testing:

optimizer = tf.keras.optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9)   
model.compile(loss=lossFunction, optimizer=optimizer, metrics=['accuracy'])
print(model.summary())

# early stopping
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', 
verbose=1, patience=20)
mc = tf.keras.callbacks.ModelCheckpoint('best_model.h5', 
monitor='val_accuracy', mode='max', verbose=1, 
save_best_only=True)
history = model.fit(trainX, trainy, validation_data=(valX, valy),
                    shuffle = True, epochs=epochs, verbose=0,
                    callbacks=[es, mc])  
saved_model =  tf.keras.models.load_model('best_model.h5',
                                          custom_objects={"GlorotUniform": tf.keras.initializers.glorot_uniform()})
# evaluate the model
_, train_acc = saved_model.evaluate(trainX, trainy, verbose=0)  # saved_model
_, val_acc = saved_model.evaluate(valX, valy, verbose=0)  # saved_model
_, accuracy = saved_model.evaluate(testX, testy, verbose=0)  # saved_model
print('Train: %.3f, Validation: %.3f, Test: %.3f' % (train_acc, val_acc, accuracy))
y_pred = saved_model.predict(testX, batch_size=64, verbose=1)

Do you see any problem in my code that might be the cause of the error that I get when I use Lambda layer?

Solution

The code you provided works for me without problem with tf.keras.backend.sum and with tf.math.reduce_sum

The answer is that your substitution doesn't alter your network or what you are you looking for. You can test it on your own and verify that tf.keras.backend.sum is equal to tf.math.reduce_sum

X = np.random.uniform(0,1, (32,100,10)).astype('float32')

(tf.keras.backend.sum(X, axis=-2) == tf.reduce_sum(X, axis=-2)).numpy().all() # TRUE

I also suggest you to wrap the operation with a Lambda layer

EDIT: the usage of tf.reduce_sum or tf.keras.backend.sum, wrapped in a Lambda layer, don't raise error if using a TF version >= 2.2.

In the model building, you need to use layers only. If you want to use some tensorflow ops (like tf.reduce_sum or tf.keras.backend.sum) you need to wrap them in keras Lambda layer. Without this the model can still work but using Lambda is a good practice in order to avoid future problem