Search code examples
opencvcomputer-visionsiftobject-detection

SIFT matches and recognition?


I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.

For this reasons I would like to try what the author of SIFT said to be pretty good: voting.

I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:

  • Location [x, y] (someone says Traslation)
  • Scale
  • Orientation

While with opencv is easy to get the match scale and orientation with:

cv::Keypoints.octave
cv::Keypoints.angle

I am having hard time to understand how I can calculate the location.

I have found an interesting slide where with only one match we are able to draw a bounding box:

But I don't get how I could draw that bounding box with just one match. Any help?


Solution

  • You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.

    Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.

    The translation between two features is (dx,dy) = (x2-x1, y2-y1).

    The scale change between two features is ds = s2 / s1.

    The rotation between two features is d_theta = theta2 - theta1.

    So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.

    Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.