What is the output of ImageNet model in Keras?

I am trying to understand Transfer learning.

In one of the tutorial codes, ImageNet/MobileNetV2 model is being used.

The output of the model is 4,4,1280 as in the screenshot.

Is it the final output layer of the network?

Actually I was hoping that the model will return the final layer's output, whose dimensions will be number of classification types the model has. e.g 1000,0,0 if trained on 1000 different classes of images.

So is this not the output layer? Or is it?

Also, the code I am referring to, applied GlobalAveragePooling to this (4,4,1280) layer?

Why do we do this?

Solution

Is it the final output layer of the network?

No. When loading the model, include_top=False was used. This means we only need the feature extractor for our transfer learning problem (not together with the top, classification layers which were trained on a different problem). We hope the knowledge gained to extract features in that problem will be useful for our problem.

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                           include_top=False,
                                           weights='imagenet')

Actually I was hoping that the model will return the final layer's output, whose dimensions will be number of classification types the model has. e.g 1000,0,0 if trained on 1000 different classes of images.

If we load the model as below (include_top=True), we can see the output now has 1000 dimensions.

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                           include_top=True,
                                           weights='imagenet')
base_mode.output
<tf.Tensor 'Logits/Identity:0' shape=(None, 1000) dtype=float32>

So is this not the output layer? Or is it?

The one in your example is not - it is known as the Bottleneck Layer. The one above in my example is.

Also, the code I am referring to, applied GlobalAveragePooling to this (4,4,1280) layer? Why do we do this?

It's the first step of appending our own classifier to the feature extractor, which will usually have a fewer layers and will end with the number of classes that we have in our problem. When training, we'll freeze the feature extractor (most of the times) and train only the newly appended classifier. When fine tuning, a few layers from the bottleneck layer will be unfrozen and retrained.