Search code examples
matlabk-meansnearest-neighborvlfeat

Assign descriptors to cluster centers after creating clusters using VLFeat


I'm clustering my data using k-means, but I'm not using standard algorithm, I'm using an approximated nearest neighbours (ANN) algorithm to accelerate the sample-to-center comparisons. This can be done easily with the following:

[clusterCenters, trainAssignments] = vl_kmeans(trainDescriptors, clusterCount, 'Algorithm', 'ANN', 'MaxNumComparisons', ceil(clusterCount / 50));

Now, when I run this code the variable 'trainDescriptors' are clustered and each descriptor is assigned to the 'clusterCenters' using ANN.

I have also another variable, 'testDescriptors'. I want to assign those to the cluster centres either. And this assignment must be done using the same approach with 'trainDescriptors', but AFAIK vl_kmeans function does not return the tree that it build for fast assignment.

So, my question is, is it possible to assign 'testDescriptors' to 'clustersCenters' as 'trainDescriptors' assigned to 'clusterCenters' in the vl_kmeans function, if yes how can I do that?


Solution

  • Well, I've figured it out. It can be done like the following:

    clusterCount = 1024;
    datasetTrain = single(rand(128, 100000)); 
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % 1 - cluster train data and get train assignments
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    [clusterCenters, trainAssignments_actual] = vl_kmeans(datasetTrain, clusterCount, ...
        'Algorithm', 'ANN', ...
        'Distance', 'l2', ...
        'NumRepetitions', 1, ...
        'NumTrees', 3, ...
        'MaxNumComparisons', ceil(clusterCount / 50) ...
    );
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % 2 - assign train data to clusters centers
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    forest = vl_kdtreebuild(clusterCenters, ...
        'Distance', 'l2', ...
        'NumTrees', 3 ...
    );
    
    trainAssignments_expected = vl_kdtreequery(forest, clusterCenters, datasetTrain);
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % 3 - validate second assignment
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    validation = isequal(trainAssignments_actual, trainAssignments_expected);
    

    In step 2 I'm creating a new tree using cluster centres and then assigning data to centers again. It gives a valid result.