Search code examples
neural-networkdeep-learningconv-neural-networkdimensionality-reductionfaster-rcnn

Reducing the Spatial Dimensions of a Neural Network Feature Map


Given a feature map of dimensionality MxNxC (for example, the output of a predicted Region of Interest from a Faster-RCNN), how would one reduce the spatial dimensions to be 1x1xC? I.e. reduce the feature map to be a vector like quantity summarizing the features of the region?

I am aware of the 1x1 Convolution, however this seems to be relevant in the channel reduction case. Average and Max Pooling also are commonly used, however it seems that these approaches are better suited to a less extreme subsampling case.

Obviously one may simply compute the mean over the spatial dimensions, however this seems rather coarse.


Solution

  • I recommend using of Global average pooling layer. You have MxNxC feature maps. Gloabal average pooling compute average for every feature map. So feature map becomes one number and set of features map becomes vector.

    I recommend this article as starting point to exploring global average pooling layer.

    https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/