Say we have a single channel image (5x5)
A = [ 1 2 3 4 5
6 7 8 9 2
1 4 5 6 3
4 5 6 7 4
3 4 5 6 2 ]
And a filter K (2x2)
K = [ 1 1
1 1 ]
An example of applying convolution (let us take the first 2x2 from A) would be
1*1 + 2*1 + 6*1 + 7*1 = 16
This is very straightforward. But let us introduce a depth factor to matrix A i.e., RGB image with 3 channels or even conv layers in a deep network (with depth = 512 maybe). How would the convolution operation be done with the same filter ? A similiar work out will be really helpful for an RGB case.
They will be just the same as how you do with a single channel image, except that you will get three matrices instead of one. This is a lecture note about CNN fundamentals, which I think might be helpful for you.