image-processing computer-vision feature-detection sift surf

Transforming an image before extracting SIFT features

Is there any advantage to transforming an image before computing SIFT features? For example, I am trying to match a "target" image of a banana:

banana

...to a "scene" image which also contains a banana, but in some unknown orientation and perspective.

First approach: extract SIFT features from the target image, match them to SIFT features in the scene image and compute a homography.

Second approach: transform the target image in various ways to simulate changes of perspective:

...before extracting SIFT features from each transform. Combine the extracted features and then match them to the scene and compute a homography.

Is there any advantage to approach 2, in terms of fidelity of feature matching?

Solution

I'd guess no. But you never know until you try. SIFT is as good as it gets when it comes to reliability. If there were any benefits to it I'd guess someone would've already implemented it as an improved algorithm.

I guess it also depends on how large blobs the algorithm detects. I'm more familiar with SURF, but I know that SIFT works similarly. Both algorithms detect blobs of different scale. When the perspective changes I'd guess the bigger blobs will fail to match, but the smaller blobs will continue to be effective.

Also if you transform the images and then extract the feature, if the transform isn't significant enough, if it's too similar to the original feature the matching algorithm will discard both the original and the transformed feature. Because the matching works by excluding all matches but one that is X times more likely than the next best match.