I am a beginner and i understood the mnist tutorials. Now i want to get something going on the SVHN dataset. In contrast to mnist, it comes with 3 color channels. I am having a hard time visualizing how convolution and pooling works with the additional dimensionality of the color channels.
Has anyone a good way to think about it or a link for me ?
I appreciate all input :)
This is very simple, the difference only lies in the first convolution:
[batch_size, W, H, 1]
so your first convolution (let's say 3x3) has a filter of shape [3, 3, 1, 32]
if you want to have 32 dimensions after.[batch_size, W, H, 3]
so your first convolution (still 3x3) has a filter of shape [3, 3, 3, 32]
.In both cases, the output shape (with stride 1) is [batch_size, W, H, 32]