Search code examples
image-processingcomputer-visionconvolutionedge-detection

Assembling a Haar-like filter for edge detection


In the paper located at this link at "B. Detecting vertical edge using Haar-like feature" it mentions the use of a Haar-like kernel in order to obtain a resulting image which emphasizes vertical or horizontal edges in the input image the way the Sobel operator does. The way I understood the Haar features is that they output the difference of the sum of pixels from the white colored rectangles and the sum of pixels from the gray/black colored rectangle. I have trouble in determining the anchor point of the resulting Haar kernel (since the dimension mentioned in the paper is 6 x 6 ). My current understanding is that if I take the vertical edge mask mentioned in the paper the resulting 6 x 6 mask would be
1, 1, 1, -1, -1, -1
1, 1, 1, -1, -1, -1
1, 1, 1, -1, -1, -1
1, 1, 1, -1, -1, -1
1, 1, 1, -1, -1, -1
1, 1, 1, -1, -1, -1
Vertical edge mask

If anyone who is more knowledgeable about this can help me on this problem (how the kernel looks like and how would a convolution be perform in order to obtain an edge map imap) I would be very grateful.
Thank-you.


Solution

  • Based on the information given in the paper Vehicle Detection Method using Haar-like Feature on Real Time System I can't tell how the group has done it exactly. However I can suggest a way on how this could be implemented.

    The main difference between a haar-like feature and a convolution kernel is that the haar-like feature has a 'fixed position' within the image while a kernel is applied to each pixel.

    A convolution kernel usually maps a local neighbourhood of a pixel to a value between 0-255. Haar-like features however define a mapping from the entire image to a single value. That is taking the sum of each pixel biased by -1, 0, 1. This bias is depending on the position of the pixel in the image.

    Converting a convolution kernel into a set of features

    That said we can extend a convolution kernel for a single pixel X and make it look similar to a haar-like feature by saying: We map the entire image by computing the sum of all pixels biased by what is given in the kernel for the neighbourhood of X and 0 else. If we do this for all pixels in the image we end up having width*height many features grouping them together in a large feature vector. Clearly they hold the same amount of information a convolution would have. However we loose the ability to easily access the 'origin' of a feature, meaning which pixel the kernel was attached to in order to compute this feature. This information is 'encoded' within the definition of that feature.

    Converting a set of features into a convolution kernel

    So can we reverse above process? For general features this can not be done. In case of the features given above we can take the non zero pixel values as the convolution kernel. If the feature set is chosen well we will end up having the same kernel for each feature. Then yes, we can get to a kernel from the feature set. Further we can find the 'origin' of the feature by averageing the position of each nonzero pixel in the feature's map.

    In case of our haar-like features the answer is conveniently given in the paper. The convolution kernel is something of size 6 x 6 with the left half being 1 and the right half being -1, just as you suggested. (Top being 1 and bottom being -1 for the other mask.) Now the center of this 6 x 6 kernel would be 3.5 x 3.5 which is no pixel value.

    Further the map defined as the kernel does not map into 0-255. This can be fixed by scaling the result or by applying a threshold. The second will loose some information, but probably provide more stable results.

    The key question now is which features were chosen in order to optain the "haar-like feature edge" image seen in they're Fig. 5? In other words, at which positions does the kernel have to be evaluated? The best bet is to attach the kernel to every corner of 4 pixels in the image. This way a detected edge lives between pixels, which somehow makes sense as an edge is defined by the gradient between two (or more) pixels. Note that the resulting image will have 1 row and 1 column less then the original image. However the resulting values are similar to boundaries and everything between two boundaries is 'the same object'.

    Another way this can be done is to choose the (3,3) position of the kernel as the ancor point and attach this point to every pixel in the image just like eigenchris suggested. You could also choose (1,1) or any other point. However, as eigenchris mentioned, edges will appear off. So to make correct classification and to select the right regions of the original image you would have to account for the distance between the (true) center of the kernel and the one you chose.