Search code examples
image-processingcomputer-visionedge-detection

Why Sobel operator looks that way?


For image derivative computation, Sobel operator looks this way:

[-1 0 1]
[-2 0 2]
[-1 0 1]

I don't quite understand 2 things about it,

1.Why the centre pixel is 0? Can't I just use an operator like below,

[-1 1]
[-1 1]
[-1 1]

2.Why the centre row is 2 times the other rows?

I googled my questions, didn't find any answer which can convince me. Please help me.


Solution

  • In computer vision, there's very often no perfect, universal way of doing something. Most often, we just try an operator, see its results and check whether they fit our needs. It's true for gradient computation too: Sobel operator is one of many ways of computing an image gradient, which has proved its usefulness in many usecases.

    In fact, the simpler gradient operator we could think of is even simpler than the one you suggest above:

    [-1 1]
    

    Despite its simplicity, this operator has a first problem: when you use it, you compute the gradient between two positions and not at one position. If you apply it to 2 pixels (x,y) and (x+1,y), have you computed the gradient at position (x,y) or (x+1,y)? In fact, what you have computed is the gradient at position (x+0.5,y), and working with half pixels is not very handy. That's why we add a zero in the middle:

    [-1 0 1]
    

    Applying this one to pixels (x-1,y), (x,y) and (x+1,y) will clearly give you a gradient for the center pixel (x,y).

    This one can also be seen as the convolution of two [-1 1] filters: [-1 1 0] that computes the gradient at position (x-0.5,y), at the left of the pixel, and [0 -1 1] that computes the gradient at the right of the pixel.

    Now this filter still has another disadvantage: it's very sensitive to noise. That's why we decide not to apply it on a single row of pixels, but on 3 rows: this allows to get an average gradient on these 3 rows, that will soften possible noise:

    [-1 0 1]
    [-1 0 1]
    [-1 0 1]
    

    But this one tends to average things a little too much: when applied to one specific row, we lose much of what makes the detail of this specific row. To fix that, we want to give a little more weight to the center row, which will allow us to get rid of possible noise by taking into account what happens in the previous and next rows, but still keeping the specificity of that very row. That's what gives the Sobel filter:

    [-1 0 1]
    [-2 0 2]
    [-1 0 1]
    

    Tampering with the coefficients can lead to other gradient operators such as the Scharr operator, which gives just a little more weight to the center row:

    [-3  0 3 ]
    [-10 0 10]
    [-3  0 3 ]
    

    There are also mathematical reasons to this, such as the separability of these filters... but I prefer seeing it as an experimental discovery which proved to have interesting mathematical properties, as experiment is in my opinion at the heart of computer vision. Only your imagination is the limit to create new ones, as long as it fits your needs...