Search code examples
c++opencvsift

OpenCV SIFT descriptor keypoint radius


I was digging into OpenCV's implementation of SIFT descriptor extraction. I came upon some puzzling code to get the radius of the interest point neighborhood. Below is the annotated code, with variable names changed to be more descriptive:

// keep octave below 256 (255 is 1111 1111)
int octave = kpt.octave & 255;
// if octave is >= 128, ...????
octave = octave < 128 ? octave : (-128 | octave);
// 1/2^absval(octave)
float scale = octave >= 0 ? 1.0f/(1 << octave) : (float)(1 << -octave);
// multiply the point's radius by the calculated scale
float scl = kpt.size * 0.5f * scale;
// the constant sclFactor is 3 and has the following comment:
// determines the size of a single descriptor orientation histogram
float histWidth = sclFactor * scl;
// descWidth is the number of histograms on one side of the descriptor
// the long float is sqrt(2)
int radius = (int)(histWidth * 1.4142135623730951f * (descWidth + 1) * 0.5f);

I understand that this has something to do with converting to the scale from which the interest point was taken (I have read Lowe's paper), but I can't connect the dots to the code. Specifically, I don't understand the first 3 lines and last line.

I need to understand this to create a similar local point descriptor for motion.


Solution

  • I don't understand the first 3 lines

    Indeed this SIFT implementation encodes several values within the KeyPoint octave attribute. If you refer to the line 439 you can see that:

    kpt.octave = octv + (layer << 8) + (cvRound((xi + 0.5)*255) << 16);
    

    Which means the octave is stored within the first byte block, the layer within the second byte block, etc.

    So kpt.octave & 255 (which can be found within the unpackOctave method) just masks out the keypoint octave to retrieve the effective octave value.

    Also: this SIFT implementation uses a negative first octave (int firstOctave = -1) to work with an higher resolution image. Since the octave indices start at 0, a mapping is computed:

    octave index = 0 => 255
    octave index = 1 => 0
    octave index = 2 => 1
    ...
    

    This mapping is computed at line 790:

    kpt.octave = (kpt.octave & ~255) | ((kpt.octave + firstOctave) & 255);
    

    Thus the second line above is just a way to map back these values:

    octave = 255 => -1
    octave = 0   => 0
    octave = 1   => 1
    ..
    

    And the third line is just a way to compute the scale, taking into account that negative octaves give a scale > 1, e.g 1 << -octave gives 2 for octave = -1 which means it doubles the size.

    [I don't understand] last line.

    Basically it corresponds to the radius of a circle that wraps a squared patch of dimension D, hence the sqrt(2) and the division by 2. D is computed by multiplying:

    • the keypoint scale,
    • a magnification factor = 3,
    • the width of descriptor histogram = 4, rounded up to the next integer (hence the +1)

    Indeed you can find a detailed description within vlfeat's SIFT implementation:

    The support of each spatial bin has an extension of SBP = 3sigma pixels, where sigma is the scale of the keypoint. Thus all the bins together have a support SBP x NBP pixels wide. Since weighting and interpolation of pixel is used, the support extends by another half bin. Therefore, the support is a square window of SBP x (NBP + 1) pixels. Finally, since the patch can be arbitrarily rotated, we need to consider a window 2W += sqrt(2) x SBP x (NBP + 1) pixels wide.

    At last I greatly recommend you to refer to this vlfeat SIFT documentation.