Search code examples
tensorflowmachine-learningconv-neural-networkonnx

Dimensions of a convolution?


I have some questions regarding how this convolution is calculated and its output dimension. I'm familiar with simple convolutions with a nxm kernel, using strides, dilations or padding, thats not a problem, but this dimensions seems odd to me. Since the model that I'm using is pretty well known onnx-mnist, I assume it is correct.

So, my point is:

  • If the input has a dimensions of 1x1x28x28, how is the output 1x8x28x28?
  • W denotes the kernel. How can it be 8x1x5x5? As far as I know, the first dimension is the batch size, but here I'm just doing inference with 1 input. Does this make sense?
  • I'm implementing from scratch this convolution operator, and so far it works for 1x1x28x28 and a kernel of 1x1x5x5, but that extra dimensions doesn't make sense to me.

Find attached the convolution that I'm trying to do, hope is not too onnx specific.

model

enter image description here


Solution

  • I do not see the code you are using but I guess 8 is the number of kernels. This means you apply 8 different kernels on your input with the size 5x5 over a batch size of 1. That is how you get 1x8x28x28 in the output, the 8 denotes the number of activation maps (one for each kernel).

    The numbers of your kernel dimensions (8x1x5x5) explained:

    • 8: Number of different filters/kernels (will be number of output maps per image)
    • 1: Number of input channels. If your input image was RGB instead of grayscale, this would be 3 instead of 1.
    • 5: First spatial dimension
    • 5: Second spatial dimension