I am pulling lower level features from the VGG16 model included as Keras application. These features are exported as separate outputs of pre-trained input data for an add-on classifier. The conceptual idea was borrowed from Multi-scale recognition with DAG-CNNs
Using the model without the classifier top, features at the highest level are extracted from block_5 pulling layer using Flatten()
: block_05 = Flatten(name='block_05')(block5_pool)
. This gives an output vector with dimension 8192. Flatten()
, however does not work on lower pulling layers as the dimensions get too large (memory issues). Instead lower pulling layers (or any other layer) can be extracted using GlobalAveragePooling2D()
: block_04 = GlobalAveragePooling2D(name='block_04')(block4_pool)
. The problem with this approach is however that the dimension of the feature vector reduces rapidly the lower you go: block_4 (512), block_3 (256), block_2 (128), block_1 (64).
What would be a suitable layer or set-up to retain more feature data from deeper layers?
For info, the output of the model looks like this, the add-on classifier has a corresponding number of inputs.
# Create model, output data in reverse order from top to bottom
model = Model(input=img_input, output=[block_05, # ch_00, layer 17, dim 8192
block_04, # ch_01, layer 13, dim 512
block_03, # ch_02, layer 9, dim 256
block_02, # ch_03, layer 5, dim 128
block_01]) # ch_04, layer 2, dim 64
The memory error you mentioned comes from flattening a huge array which makes the number of units extremely large. What you actually need to do is to downsample your input in a smart way. I will present you some way on how to do this:
MaxPooling
: by simple usage of pooling - you could first downsample your feature maps and then Flatten
them. The main advantage of this approach is its simplicity and lack of need of additional parameters. The main disadvantage : this might be a really rough method.Convolutional2D
layers with huge subsampling (e.g. with filter size (4, 4)
and subsample (4, 4)
). This might be consider as intelligent pooling. A main disadvantage of this method is additional parameters need for this approach.