Search code examples
image-processingcomputer-visionsift

the value of 128 sift descriptor?


I know that we take a 16x16 window of "in-between" pixels around the key point. we split that window into sixteen 4x4 windows. From each 4x4 window, we generate a histogram of 8 bins. Each bin corresponding to 0-44 degrees, 45-89 degrees, etc. Gradient orientations from the 4x4 are put into these bins. This is done for all 4x4 blocks. Finally, we normalize the 128 values you get. Where they get their value

but I misunderstand where the 128 number get their value from? did it refer to the corresponding magnitude of the orientation value or what?

I would be grateful if anyone describes any numerical example Regards!


Solution

  • In SIFT (Scale-Invariant Feature Transform), the 128 dimensional feature vector is made up of 4x4 samples per window in 8 directions per sample -- 4x4x8 = 128.

    For an illustrated guide see A Short introduction to descriptors, and in particular this image, showing 8-direction measurements (cardinal and inter-cardinal) embedded in each of the 4x4 grid squares (center image) and then a histogram of directions (right image):

    enter image description here

    From your question I believe you are also unclear on what the information inside the descriptor is -- it is called Histograms of Oriented Gradients (HOG). For further reading, Wikipedia has an overview of HOG gradient computation:

    Each pixel within the cell casts a weighted vote for an orientation-based histogram channel based on the values found in the gradient computation.

    Everything is built on those per-pixel "votes".