Search code examples
machine-learningneural-networkcomputer-visionconv-neural-networkmax-pooling

Does omitting pooling layers in CNNs make sense in some cases?


I know that a usual CNN consists of both convolutional and pooling layers. Pooling layers make the output smaller which means less computations and they also make it somehow transform invariant, so the position of the feature from the kernel filter can be shifted in the original image a little bit.

But what happens when I don't use pooling layers? The reason could be that I want a feature vector for each pixel from the original image, so the output of the convolutional layers has to be of the same size as the image, just having more channels. Does this make sense? Will there be still the useful information in these feature vectors or having the pooling layers in CNNs is necessary? Or are there some approaches to get feature vectors of individual pixels with pooling layers?


Solution

  • Convolutional feature maps, early and later ones, contain a lot of useful information. Many interesting and fun applications are based exactly on the feature maps from the pre-trained CNNs, e.g. Google Deep Dream and Neural Style. A common choice for a pre-trained model is VGGNet for its simplicity.

    Also note that some CNNs, e.g. All Convolutional Net, replace pooling layers with convolutional ones. They still do downsampling through striding, but completely avoid maxpool or avgpool operations. This idea has become popular and applied in many modern CNN architectures.

    The only difficulty is that CNN without downsampling may be harder to train. You need enough training data, where labels are images (I assume you have), and you'd also need some clever loss function for backpropagation. Of course, you can start with L2 norm of pixel difference, but it really depends on the problem you're solving.

    My recommendation would be to take an existing pre-trained CNN (e.g. VGGNet for tensorflow) and leave just first two convolutional layers, up until the first downsampling. This is a fast way to try this kind of architecture.