Search code examples
machine-learningdeep-learningneural-networkconv-neural-network

What does it mean to have a "pixel is knocked off" in CNN?


I'm reading a book where a section introduces how kernel works in CNN: https://freecontent.manning.com/deep-learning-for-image-like-data/.

Sliding a kernel over an image and requiring that the whole kernel is at each position completely within the image, yields to an activation map with reduced dimensions. For example, if you’ve a 3 x 3 kernel on all sides, one pixel is knocked off in the resulting activation map; in case of a 5 x 5 kernel, even two pixels.

What does it mean here to have one or two pixels that is knocked off?


Solution

  • They mean, that without extra padding, using 3x3 kernel will "loose" one pixel per side in the output. So if your input image is NxN the output will be (N-2)x(N-2).

    For example witn N=5 you can see that when the kernel "fits" into lower right corner its center is "one pixel off in both horizontal and vertical axes".

    a a a a a           . . . . .
    a a a a a           . b b b .
    a a x x x    ===>   . b b b .
    a a x X x           . b b B . 
    a a x x x           . . . . .
    
     5 x 5                3 x 3
    

    To avoid this issue various padding strategies are used, e.g. to "surround your picture" with 0s so that size is preserved

    0 0 0 0 0 0 0            . . . . . . .
    0 a a a a a 0            . b b b b b .
    0 a a a a a 0            . b b b b b .
    0 a a a a a 0     ===>   . b b b b b .
    0 a a a x x x            . b b b b b .
    0 a a a x X x            . b b b b B .
    0 0 0 0 x x x            . . . . . . .
    
     5 x 5 + pad                5 x 5