Extrema detection in difference of gaussian images in SIFT

I have a question about the workings of the SIFT algorithm. So, say I have a scale space representation of the individual images across many octaves by convolving the image with Gaussian filters of various sizes. Futhermore, I have computed the various difference of Gaussian (DoG) images for each of these octaves.

Let us assume I have 7 DoG images for a given octave. My question is regarding the maxima finding in these DoG images. According to the literature, one compares against 8 local neighbours and 9 neighbours for each of the neighbouring DoG images.

So, now say I am processing these 7 DoG images and I will start from index 1 and go all the way to index 5. So, something like:

for (int i = 1; i <= 5; ++i)
{ 
   for (int y = 1; y < image_height-1; ++y)
   {
       for (int x = 1; x < image_width-1; ++x)
       {
           current_pixel = image[x, y, i];
           // Compare with the neighbours
           // check if it is a maxima at loc (x, y, i)
       }
   }
}

So, here I am iterating through the image and will check if it is a maxima at this location. My question is now I will end up with the maxima locations at each of these scales (from 1 to 5 in my case). So, for a given (x, y) location there can be multiple maximas (for example at scale 1, 3 and 5). So, is that a problem or there can be multiple keypoints associated for the same spatial location (x, y)? Can someone explain to me how the algorithm proceeds to refine these keypoints?

Solution

You will want to find the extrema across scale as well.

Scale-space extrema detection means finding the extremum for every pixel across "scale" and across "space." Space is the xy-plane in the image. Space is the index into the pyramid.

Wht do you want to do this?

The idea of scale-space extrema detection is to find the scale at which a feature has the highest response. For example, if you have a small blob in the image. Its extremum will be at a fine scale. At a coarse scale, this small blob will be washed out.

For a large blob, computing the score at a fine scale does not produce an extremum. But, if the scale is coarse enough the blob will stand out. That is, for coarser levels of the pyramid smaller structures around that small blob will be washed out, and the large blob will stand out.