c++opencv image-processing nearest-neighbor locality-sensitive-hash

Locality Sensitivy Hashing in OpenCV for image processing

This is my first image processing application, so please be kind with this filthy peasant.

THE APPLICATION:

I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).

Any suggestion on how to implement such an application is well accepted.

FEATURE DESCRIPTIONS IN OPENCV:

I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.

From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.

MOST SIMILAR IMAGE PROBLEM AND LSH:

The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.

THE QUESTION:

What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.

Is there any implementation for this problem?

Solution

I will provide a general answer, going beyond the scope of OpenCV library.

Quoting this answer:

descriptors: they are the way to compare the keypoints. They summarize, in vector format (of constant length) some characteristics about the keypoints.

With that said, we can imagine/treat (geometrically) a descriptor as point in a D dimensional space. So in total, all the descriptors are points in a D dimensional space. For example, for GIST, D = 960.

So actually descriptors describe the image, using less information that the whole image (because when you have 1 billion images, the size matters). They serve as the image's representatives, so we are processing them on behalf of the image (since they are easier/smaller to treat).

The problem you are mentioning is the Nearest Neighbor problem. Notice that an approximate version of this problem can lead to significant speed ups, when D is big (since the curse of dimensionality will make the traditional approaches, such as a kd-tree very slow, almost linear in N (number of points)).

Algorithms that solve the NN problem, which is a problem of optimization, are usually generic. They may not care if the data are images, molecules, etc., I for example have used my kd-GeRaF for both. As a result, the algorithms expect N points in a D dimensional space, so N descriptors you might want to say.

Check my answer for LSH here (which points to a nice implementation).

Edit:

LSH expects as input N vectors of D dimension and given a query vector (in D) and a range R, will find the vectors that lie within this range from the query vector.

As a result, we can say that every image is represented by just one vector, in SIFT format for example.

You see, LSH doesn't actually solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, High-dimensional approximate nearest neighbo. kd-GeRaF and FLANN solve directly the k-NN problem.