Suppose we have a simple Keras model that uses BatchNormalization:
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=(1,)),
tf.keras.layers.BatchNormalization()
])
How to actually use it with GradientTape? The following doesn't seem to work as it doesn't update the moving averages?
# model training... we want the output values to be close to 150
for i in range(1000):
x = np.random.randint(100, 110, 10).astype(np.float32)
with tf.GradientTape() as tape:
y = model(np.expand_dims(x, axis=1))
loss = tf.reduce_mean(tf.square(y - 150))
grads = tape.gradient(loss, model.variables)
opt.apply_gradients(zip(grads, model.variables))
In particular, if you inspect the moving averages, they remain the same (inspect model.variables, averages are always 0 and 1). I know one can use .fit() and .predict(), but I would like to use the GradientTape and I'm not sure how to do this. Some version of the documentation suggests to update update_ops, but that doesn't seem to work in eager mode.
In particular, the following code will not output anything close to 150 after the above training.
x = np.random.randint(200, 210, 100).astype(np.float32)
print(model(np.expand_dims(x, axis=1)))
with gradient tape mode BatchNormalization layer should be called with argument training=True
example:
inp = KL.Input( (64,64,3) )
x = inp
x = KL.Conv2D(3, kernel_size=3, padding='same')(x)
x = KL.BatchNormalization()(x, training=True)
model = KM.Model(inp, x)
then moving vars are properly updated
>>> model.layers[2].weights[2]
<tf.Variable 'batch_normalization/moving_mean:0' shape=(3,) dtype=float32, numpy
=array([-0.00062087, 0.00015137, -0.00013239], dtype=float32)>