Search code examples
matlabimage-processingcomputer-visionfeature-extractionvlfeat

Smoothening the image before feature extraction


I am using the vl_phow function (http://www.vlfeat.org/matlab/vl_phow.html) and wondering why the smoothing is applied before the feature extraction.

To be more specific in the documentation of vl_phow it is mentioned that:

The image is smoothed by a Gaussian kernel of standard deviation SIZE / MAGNIF. Note that, in the standard SIFT descriptor, the magnification value is 3; here the default one is 6 as it seems to perform better in applications.

So why is this smoothing operation is done?

Also in the same documentation there is WindowSize option which is explained as follows: size of the Gaussian window in units of spatial bins. Is this Window used to smooth the image or for something else?

Can you please tell me why it is done and the advantages of smoothing the image before the feature extraction.


Solution

  • While this is more a math question than a MATLAB question , I'll go for it anyway.

    SIFT features are supposed to be points that "stand out" in the image. They are features that have high information content, and that will be "invariant" in different images.

    However, a noisy image may have "noise" that looks like something important. Basic example:

    [0 0 0 0 0
     0 0 0 2 2
     0 1 0 2 2
     0 0 0 0 0]
    

    Without smoothing, one may think that there are 2 areas with loads of information there, the area with 2s and the area with a single 1. However, while the area with 2s does seem like part of the information, as there are a lot of them together, the part with a single 1 may be just noise, a small random value added there due to the noise in the imaging technology.

    If you smooth the image with a filter you'dd get something like (made up example)

    [0 0    0 0   0
     0 0    0 1.9 2
     0 0.01 0 1.9 2
     0 0    0  0  0]
    

    Where it is way more obvious that the 1 is just noise, but the 2s stay.

    That;s why feature extraction algorithms, such as SIFT, do generally smooth the image before getting keypoints.

    The bigger the smoothing window, the more robust keypoints you'll find, as smaller things will be removed. However you will also find less keypoints. Also making it too big you risk deleting real information (the 2s int he example).