I am trying to create an application that will match an image of a Building facade to an image in my database (See sample image below). I'm implementing the application in Java and so far I have been following this tutorial: http://docs.opencv.org/2.4/doc/tutorials/features2d/feature_homography/feature_homography.html
What I would like help with is how to move onto the next step and adapt my code so that I pull up stored images to match against the input image. How would I do this? Do I store the image itself and the keypoints of each image in the database? or do I store the Descriptor Matchers?
Any tutorials or examples of an application like this would be greatly appreciated.
You will basically store
You don't really have to store each image, but you may want to store at least one for each building to show user best match.
What issues do you have at the moment? Amount of space needed for your database? Speed of matching? Or maybe matching quality? Depending on your answer you may get different approaches.
I would try to implement the simplest approach first, just iterating over the data extracted from reference images in your database and trying to match your image. You can select the reference image that yields maximum inliers and then check if their amount is higher that some empirically defined threshold to determine whether you have a match.
If you'll have problems with performance you can try to gain advantage of the fact you are preparing base beforehand and precompute something useful. One example would be several k-d trees, or a k-d tree that has features from all images (storing the index each feature came from), and then perform matching with some modifications (allow each keypoint from source image to match multiple keypoints if they come from different reference images). Then after matching and geometrical tests check what reference image gets maximum matches.
If you'll have problems with memory you can try to limit number of feature points per each reference image (perform descending sort by score, leave only N best features). You can also use descriptors that are smaller (SURF over SIFT, etc). But I don't think that this is likely scenario, since you will need somewhere around 100-1000 features per reference image, and assuming you use SIFT descriptor with 128 floats you'll get 1000*128*4 = 500 kilobytes per image. Using 200 points per reference image and SURF descriptors with 64 float will give you 50 kilobytes per image. You can go even further and use chars for for SURF and get ~13 kB per image, but matching quality will likely degrade.