machine-learning computer-vision conv-neural-network

What is the Output Dimension of CNN and How it works

When i have some 64 * 64 image with 3 channels and i pass it thorugh a conv layer with kernel size of 3 * 3 and 32 filters what happens, I mean what will be passed to next or adjacent conv layer what dimension input will next layer get.

I'm confused how each filter would slide thru

Solution

Here's what happens step by step:

Input Image: You have a 64x64 image with 3 color channels, resulting in a shape of 64x64x3.
Convolution with 32 Filters: You apply a convolutional layer with 32 filters, each having a kernel size of 3x3. Each filter will slide across the input image in a 2D grid pattern, performing element-wise multiplications between the filter's weights and the corresponding pixels in the image patch. The results of these multiplications are summed up to produce a single value for each position where the filter is applied. This process is repeated for each filter, resulting in 32 separate feature maps (also called activation maps).
Output Feature Maps: Each of the 32 filters produces a feature map that highlights a specific pattern or feature in the input image. These feature maps are stacked together along the depth dimension, resulting in an output volume with a shape of 64x64x32. The output from this convolutional layer with 32 feature maps can then be passed to subsequent layers, such as additional convolutional layers, pooling layers, or fully connected layers, depending on the architecture of your CNN.

You can also learn more in this thread: Calculate the output size in convolution layer