classification similarity sift descriptor vlfeat

Get the 5 most similar images

I would like to compare which are the 5 most similar images to an input image. To do this I thought to use the SIFT (VLFeat library) and compare the respective descriptors. So I use the vl_ubcmatch (doc here) method to calculate the similarity measurement between the images.

This is the code:

path_dir = './img/';
imgs = dir(path_dir);
imgs = imgs(3 : end); 
numImgs = size(imgs);
numImgs = numImgs(1);
path1 = './img/car01.jpg';
Ia = imread(path1);
Ia = single(rgb2gray(Ia));
[fa, da] = vl_sift(Ia);

results = struct;
m = 0;
j = 1; % indice dell'img (del for)

for img = imgs'

    path = strcat(path_dir, img.name);
    if(strcmp(path1, path) == 0)
        Ib = imread(path);
        Ib = single(rgb2gray(Ib));
        [fb, db] = vl_sift(Ib);

        [matches, scores] = vl_ubcmatch(da, db);

        s = sum(scores);
        [r, c] = size(scores);
        m = s ./ c;

        results(j).measure = m;
        results(j).img = path;
        j = j + 1;
    end
end

As you can see from the code, I thought I would use the mean as a measure of similarity but the results I get are not satisfactory (for example, it tells me that the input image of a cup is more similar to a tree than another cup).

According to you, is it better to have more equal descriptors but with low similar or less similar descriptors but with greater similarity? I have 50 images of 5 different categories (cups, trees, people, tables and cars) and, given an image as input, the program will return the 5 most similar images to it and preferably belonging to the same category.

What measurement can I use instead of the mean to get a more precise classification? Thanks!

Solution

According to your code you measure the similarity between image (Ia) and all other images (Ib). Therefore you compare the SIFT descriptors of Ia with those of all Ib's - which gives you a list of feature matches for each image pair (matches) and the Euclidean distance of each feature pair (scores).

Now using the mean of all scores of an image pair as a measure of similarity is not a very robust approach because an image pair with only one feature match could (by chance) lead to a better "similarity" than an image pair with many features - which I guess is an unrealistic solution for your task.

Concerning your question it is always better to have meaningful/robust descriptors, even if there are only a few (of course the more the better!), than having a lot of meaningless descriptors.

Proposal: why don't you just count the number of inliers (= number of feature matches for each image pair, numel(matches))?

With this it should give more inliers between images of the same object than different objects, so taking those pairs which have the 5 most inliers should be the most similar ones.

If you just want to distinguish a cup from a tree it should work. If your classification task is getting more difficult and you need to distinguish different types of trees, SIFT is not the best algorithm to use. A learning approach will give better results... but depends on your task.