machine-learning computer-vision convolution caffe pycaffe

Why does the output of this convolution have these dimensions?

I am trying to replicate the output of a convolution in Caffe.

As far as I understand, Caffe uses the im2col algorithm to cast nD arrays into matrices and multiply them together. However, the dimensions of the output in Caffe confuse me.

Using the ImageData layer, I input 4 images of dimension 150x149 with a batch size of 4. Caffe creates a 4D array with dimensions 4x3x149x150.

I convolve these with a convolution layer with a filter of size 7 and a stride of 1 (num_output = 1 & bias = the zero vector). This means that the weights are of dimensions 1x3x7x7. As far as I understand, if the stride is 1, the filter should be applied to every element and the output should have the same dimensions as the input. What I get, however, is an output of the following dimensionality: 4x1x143x144.

I don't see how this is possible. How would one carry out the same operations in Matlab (or whatever)?

How do you get from the input to the output?

Solution

Your convolution filter has width 7. Looking at it in 1D, its first application will be to the pixels

1, 2, 3, 4, 5, 6, 7,

the second one to the pixels

2, 3, 4, 5, 6, 7, 8,

and so on until the last one, which operates on pixels

143, 144, 145, 146, 147, 148, 149.

So, as you can see, there are 143 different applications of the filter, and each one produces one output pixel. Hence, the output dimension is 143. It works analogously for the other coordinate direction.

In short, the output width for a filter with stride 1 will always be

output width = image width - filter width + 1.