matlab image-processing histogram knn surf

Encode each training image as a histogram of the number of times each vocabulary element shows up for Bag of Visual Words

I want to implement bag of visual words in MATLAB. I used SURF features to extract features from the images and k-means to cluster those features into k clusters. I now have k centroids and I want to know how many times each cluster is used by assigning each image feature to its closet neighbor. Finally, I'd like to create a histogram of this for each image.

I tried to use knnsearch function but it doesn't work in this case.

Here is my MATLAB code:

clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name}; 
for k=1:numel(files)
    fullFileName = fullfile(folder, files{k});
    H = fspecial('log');
    image=imfilter(imread(fullFileName),H);
    temp =  detectSURFFeatures(image);
    [im_features, temp] = extractFeatures(image, temp);
    features{k}= im_features;

end

features = vertcat(features{:});
image_feats = [];
[assignments,centers] = kmeans(double(features),500);
vocab = centers';

I have all images feature in features array and cluster center in centroid array

Solution

You're almost there. You don't even need to use knnsearch at all. The assignments variable tells you which input feature mapped to which cluster. assignments will give you a N x 1 vector where N is the total number of examples you have, or the total number of features in the input matrix features. Each value assignments(i) tells you which cluster the example i (or row i) of features it maps to. The cluster centroid dictated by assignments(i) would be given as centers(i, :). Therefore given how you've called kmeans, it will be a N x 1 vector where each element is from 1 to 500 with 500 being the total number of clusters desired.

Let's do the simple case where we only have one image in your codebook. If this is the case, all you have to do is create a histogram of the assignments variable. The output histogram h will be a 500 x 1 vector with each element h(i) being the number of times an example used centroid i as its representation in your codebook.

Just use the histcounts function and make sure that you specify the bin ranges so that they coincide with each cluster ID. You must make sure that you account for the ending bin, as the bin ranges are exclusive on the right edge so just add an additional bin to the end.

Something like this will work:

h = histcounts(assignments, 1 : 501);

If you want something simpler and you don't want to worry about specifying the end bin, you can use accumarray to achieve the same result:

h = accumarray(assignments, 1);

The effect of accumarray we assign key-value pairs where the key is the centroid that the example mapped to and the value is simply 1 for all keys. accumarray will bin all values in assignments that share the same key and you do something with those values. The default behaviour of accumarray is to sum all values, which is effectively computing the histogram.

However, you want to do this for multiple images, not just a single image. For Bag of Visual Words problems, we will certainly have more than one training image in our database. Therefore, you want to find the histogram of the features for each image. We can still use the above concept, but one thing I can suggest is you maintain a separate variable that tells you how many features were detected per image, then you can index into the assignments variable to help extract out the correct assigned centroid IDs, then build a histogram of those individually. We can build a 2D matrix where each row delineates the histogram of each image. Remember that in kmeans, each row tells you what cluster each example was assigned to independently of the other examples in your data. Using that, you would use kmeans on the entire training dataset, then be smart about how you're accessing the assignments variable to extract out the assigned clusters for each input image.

Therefore, modify your code so that it looks something like this:

clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name}; 
num_features = zeros(numel(files), 1); % New - for keeping track of # of features per image
for k=1:numel(files)
    fullFileName = fullfile(folder, files{k});
    H = fspecial('log');
    image=imfilter(imread(fullFileName),H);
    temp =  detectSURFFeatures(image);
    [im_features, temp] = extractFeatures(image, temp);
    num_features(k) = size(im_features, 1); % New - # of features per image
    features{k}= im_features;    
end

features = vertcat(features{:});
num_clusters = 500; % Added to make the code adaptive
[assignments,centers] = kmeans(double(features), num_clusters);

counter = 1; % Keeps track of where we need to slice in assignments

% Go through each image and find their histograms
features_hist = zeros(numel(files), num_clusters); % Records the per image histograms
for k = 1 : numel(files)
    a = assignments(counter : counter + num_features(k) - 1); % Get the assignments
    h = histcounts(a, 1 : num_clusters + 1);
    % Or:
    % h = accumarray(a, 1).'; % Transpose to make it a row

    % Place in final output
    features_hist(k, :) = h;

    % Increment counter 
    counter = counter + num_features(k);
end

features_hist will now be a N x 500 matrix where each row is the histogram of each image you are seeking. The final job would be to use a supervised machine learning algorithm (SVM, Neural Networks, etc.) where the expected labels is the description of each image you have assigned to the image accompanied by the histogram of each image as the input features. The final result would be a learned model so that when you have a new image, calculate the SURF features, represent them in a histogram of features like we did above, then feed it into the classification model to give you the expected class or label that the image represents.

P.S. Deep Learning / CNNs do a much better job at this, but require much more time to train. If you're looking at performance wise, don't use Bag of Visual Words but this is something very quick to implement and it's known to perform moderately well but that of course depends on the kinds of images you want to classify.