Search code examples
python-3.xtensorflow2.xmobilenet

How can MobileNetV2 have the same number of parameters for different custom input shapes?


I'm following the tensorflow2 tutorial on fine-tunning and transfer learning using a MobileNetV2 as base architecture.

The first thing I noticed is that the biggest input shape available for pre-trained 'imagenet' weights is (224, 224, 3). I tried to use a custom shape (640, 640, 3) and as per the documentation, it gives a warning saying that the weights for the (224, 224, 3) shape were loaded.

So if I load a network like this:

import tensorflow as tf

tf.keras.backend.clear_session()
def create_model():
  base_model = tf.keras.applications.MobileNetV2(input_shape=(640,640,3),
                                include_top=False)
  x = base_model.output
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dense((1), activation='sigmoid')(x)
  x = tf.keras.Model(inputs=base_model.inputs, outputs=x)
  x.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
                         loss='binary_crossentropy',
                         metrics=[tf.keras.metrics.BinaryAccuracy()])
  return x

tf_model = create_model()

It gives the warning:

WARNING:tensorflow:`input_shape` is undefined or non-square, or `rows` is not in [96, 128, 160, 192, 224]. Weights for input shape (224, 224) will be loaded as the default.

If I try to use an input shape like (224, 224, 3) then the warning vanishes, nevertheless, I tried to check the number of trainable parameters in both cases using

tf_model.summary()

and found out that the number of trainable parameters is the same

Total params: 2,259,265
Trainable params: 2,225,153
Non-trainable params: 34,112

even though the number size of the Convolutional filters changes accordingly to the custom input shape. So how can the number of parameters remain the same even when the Convolutional filters have bigger (spatial) sizes?


Solution

  • You're rights. The number of conv parameters only depends of the size of the kernel, the number of channels for a particular layer and the total number of layers.

    However, the problem when you change the input resolution (here 640x480x3) is that the final layer right before the fc layer won't have the same dimension than the network with 224x224x3. Thus, it's not compatible as is.

    Why?

    example with Input resolution 224x224x3 :

    1. 1st layer stride = 2 thus output of layer 1 is 112x112x32
    2. 2nd layer stride = 2 thus output of layer 2 is 56x56x16
    3. 3rd layer stride = 1 thus output of layer 3 is 56x56x32
    4. and so on...

    The stride affects the resolution of intermediate feature maps. Last layer would be bigger if you use a 640x480x3 input resolution so the FC layer is not compatible. You should transfer the convolutional weights learned from the vanilla model (with 224x224 resolution) to the new convnet compatible with 640x480x3 input data.