I have several images. I want to use VGG to get a 1D feature map of 4096 for each , similar to what is done here:
(They have 700 images as input. They removed the last fully connected layer from VGG16, therefore, the VGGNet included 13 convolutional layers (Conv), 5 Max-pooling layers and 2 fully connected
layers (Fc), which generated a 700 × 4096 feature map as its output)
What is the best way to do it? (please notice, I only need the upper part of the PS - generate a 1X4096 vector per image. The concatenation is not important currently)
This should be sufficient considering the following restrictions:
This does not mean that this concept is wrong or not. For example, there are cases where it is important to have the position encoded. And there are others where it is interesting to work with images of different sizes, for example to make inferences at multiple scales and average results.
To do this in Keras you only need the original VGG model. You can create the new:
model_base = tf.keras.applications.VGG16(
include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000,
classifier_activation="softmax",
)
new_input = model_base.input
new_output = model_base.get_layer('fc2').output
model = tf.keras.Model(new_input, new_output)
This is a very widespread way to extract features, depending on your interests you can keep the FC or not. For example, if you analyze the original keras code: https://github.com/keras-team/keras-applications/blob/master/keras_applications/vgg16.py, you will see that if you indicate the include_top=False, the FC part will be removed (known as classification) and only the feature extraction part + global average pooling will be preserved.
UPDATE:
Does VGG not encode spatial impormation? Is there other pre-trained network that does perserve it?
What will be effect of include_top = False on the generate feature map?
What is the best solution? There is no better solution, it depends on your problem and it is usually not so significant.