Search code examples
matlabcluster-analysisk-means

Getting the index of closest data point to the centriods in Kmeans clustering in MATLAB


I am doing some clustering using K-means in MATLAB. As you might know the usage is as below:

[IDX,C] = kmeans(X,k)

where IDX gives the cluster number for each data point in X, and C gives the centroids for each cluster.I need to get the index(row number in the actual data set X) of the closest datapoint to the centroid. Does anyone know how I can do that? Thanks


Solution

  • The "brute-force approach", as mentioned by @Dima would go as follows

    %# loop through all clusters
    for iCluster = 1:max(IDX)
        %# find the points that are part of the current cluster
        currentPointIdx = find(IDX==iCluster);
        %# find the index (among points in the cluster)
        %# of the point that has the smallest Euclidean distance from the centroid
        %# bsxfun subtracts coordinates, then you sum the squares of
        %# the distance vectors, then you take the minimum
        [~,minIdx] = min(sum(bsxfun(@minus,X(currentPointIdx,:),C(iCluster,:)).^2,2));
        %# store the index into X (among all the points)
        closestIdx(iCluster) = currentPointIdx(minIdx);
    end
    

    To get the coordinates of the point that is closest to the cluster center k, use

    X(closestIdx(k),:)