Search code examples
neural-networkconv-neural-networkyolodarknet

Understanding weird YOLO convolutional layer output size


I am trying to understand how Darknet works, and I was looking at the yolov3-tiny configuration file, specifically the layer number 13 (line 107).

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

The size of the kernel is 1x1, the stride is 1 and the padding is 1 too. When I load the network using darknet, it indicates that the output width and height are the same as the input:

13 conv    256       1 x 1/ 1     13 x  13 x1024 ->   13 x  13 x 256

However, shouldn't the width and height increase by 2 since the kernel is 1x1 and there is padding? If I understand it correctly, the kernel is going to run through all the "pixels" of the input plus the padding, so it makes sense for me that the width and height should increase by 2*padding.

I used the formula

output_size = ((input_size – kernel_size + 2*padding) / stride) + 1

and it checks out. (13 - 1 + 2 * 1) / 1 + 1 = 15

Does anybody know what I'm missing?

Thank you in advance.


Solution

  • I figured it out.

    I misunderstood the pad parameter in the layer. If you want the padding to be 1, you should write:

    padding=1
    

    pad is actually a boolean. When set to one, the padding of the layer will be equal to size / 2.

    In this case, the size of the kernel was 1, and so the padding ends up being 1/2 = 0 (integer operation). Since there is no padding, the output width and height remains the same as in the input.

    I should've RTFM.