Search code examples
tensorflowconvolutionpooling

How to imagine convolution/pooling on images with 3 color channels


I am a beginner and i understood the mnist tutorials. Now i want to get something going on the SVHN dataset. In contrast to mnist, it comes with 3 color channels. I am having a hard time visualizing how convolution and pooling works with the additional dimensionality of the color channels.

Has anyone a good way to think about it or a link for me ?

I appreciate all input :)


Solution

  • This is very simple, the difference only lies in the first convolution:

    • in grey images, the input shape is [batch_size, W, H, 1] so your first convolution (let's say 3x3) has a filter of shape [3, 3, 1, 32] if you want to have 32 dimensions after.
    • in RGB images, the input shape is [batch_size, W, H, 3] so your first convolution (still 3x3) has a filter of shape [3, 3, 3, 32].

    In both cases, the output shape (with stride 1) is [batch_size, W, H, 32]