I'd like to get a better understanding of the parameter training, when calling a Keras model.
In all tutorials (like here) it is explained, that when you are doing a custom train step, you should call the model like this (because some layers may behave differently depending if you want to do training or inference):
pred = model(x, training=True)
and when you want to do inference, you should set training to false:
pred = model(x, training=False)
What I am wondering now is, how this is affected by the creation of a functional model. Assume I have 2 models: model_base and model_head, and I want to create a new model out of those two, where I want the model_base allways to be called with training=False
(because I plan on freezing it like in this tutorial here):
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
What will in such a case happen, when I later on call new_model(x_new, training=True)
? Will the usage of training=False
for the base_model
be overruled? Or will training now allways be set to True for the base_model
, regardless of what I pass to the new_model
? If the latter is the case, does that also mean, that if I set e.g. outputs = head_model(inputs, training=True)
, that this part of the new model would always run in training mode? And how would it work out if I don't give any specific value for training, when I run the new_model like this new_model(x_new)
?
Thanks in advance!
training
is a boolean argument that determines whether this call
function runs in training mode or inference mode. For example, the Dropout
layer is primarily used to as regularize in model training, randomly dropping weights but in inference time or prediction time we don't want it to happen.
y = Dropout(0.5)(x, training=True)
By this, we're setting training=True
for the Dropout
layer for training time. When we call .fit()
, it set sets a flag to True
and when we use evaluate
or predict
, in behind it sets a flag to False
. And same goes for the custom training loop. When we pass input tensor to the model within the GradientTape
scope, we can set this parameter; though it does not have manually set, the program will figure out itself. And same goes to inference time. So, this training
argument is set as True
or False
if we want layers to operate either training
mode or inference
mode respectively.
# training mode
with tf.GradientTape() as tape:
logits = model(x, training=True) # forward pass
# inference mode
al_logits = model(x, training=False)
Now coming to your question. After defining the model
# Freeze the base_model
base_model.trainable = False
inputs = keras.Input(shape=(150, 150, 3))
x = base_model(inputs, training=False)
outputs = head_model(x)
new_model = keras.Model(inputs, outputs)
Now if your run this new model whether .fit()
or custom training loop, the base_model
will always run in inference mode as it's sets training=False
.