keras deep-learning keras-layer tf.keras vgg-net

Keras get feature map of last layer for a specific image in VGG

I have several images. I want to use VGG to get a 1D feature map of 4096 for each , similar to what is done here: (They have 700 images as input. They removed the last fully connected layer from VGG16, therefore, the VGGNet included 13 convolutional layers (Conv), 5 Max-pooling layers and 2 fully connected layers (Fc), which generated a 700 × 4096 feature map as its output)

What is the best way to do it? (please notice, I only need the upper part of the PS - generate a 1X4096 vector per image. The concatenation is not important currently)

Solution

This should be sufficient considering the following restrictions:

Your network is not fully convolutional, you will not be able to change the input size.
Your network could have some spatial information encoded (as you are converting a grid to a flatten vector).

This does not mean that this concept is wrong or not. For example, there are cases where it is important to have the position encoded. And there are others where it is interesting to work with images of different sizes, for example to make inferences at multiple scales and average results.

To do this in Keras you only need the original VGG model. You can create the new:

model_base = tf.keras.applications.VGG16(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

new_input = model_base.input
new_output = model_base.get_layer('fc2').output
model = tf.keras.Model(new_input, new_output)

This is a very widespread way to extract features, depending on your interests you can keep the FC or not. For example, if you analyze the original keras code: https://github.com/keras-team/keras-applications/blob/master/keras_applications/vgg16.py, you will see that if you indicate the include_top=False, the FC part will be removed (known as classification) and only the feature extraction part + global average pooling will be preserved.

UPDATE:

Does VGG not encode spatial impormation? Is there other pre-trained network that does perserve it?

VGG together with fully-connected YES that encodes spatial information in the classification layers. That means that the last layers for example if the prediction depends on detecting the sky above, it is contemplated. The opposite case that could happen to you is if you want to classify a large image with a cat in the upper left as a cat and you have only trained with images of cats on the right, then you could have problems with spatial information. However, there are other ways to encode spatial information. The biggest benefit of fully-convolutional networks is that you can use images of different sizes. It depends on the problem, currently fully-convolutional networks are used a lot, which means that there is no transformation from a grid to a flatten vector. Otherwise the whole image is averaged and therefore the spatial information is lost. It is the standard now.

What will be effect of include_top = False on the generate feature map?

If it will affect, with include_top = True you are keeping all the fully-connected layers and then keeping the fc2 (https://neurohive.io/wp-content/uploads/2018/11/vgg16.png). If you put include_top = False, you will remove the FC and put a layer of global averange pooling that will average all the features.

What is the best solution? There is no better solution, it depends on your problem and it is usually not so significant.