I am doing some clustering using K-means in MATLAB. As you might know the usage is as below:
[IDX,C] = kmeans(X,k)
where IDX gives the cluster number for each data point in X, and C gives the centroids for each cluster.I need to get the index(row number in the actual data set X) of the closest datapoint to the centroid. Does anyone know how I can do that? Thanks
The "brute-force approach", as mentioned by @Dima would go as follows
%# loop through all clusters
for iCluster = 1:max(IDX)
%# find the points that are part of the current cluster
currentPointIdx = find(IDX==iCluster);
%# find the index (among points in the cluster)
%# of the point that has the smallest Euclidean distance from the centroid
%# bsxfun subtracts coordinates, then you sum the squares of
%# the distance vectors, then you take the minimum
[~,minIdx] = min(sum(bsxfun(@minus,X(currentPointIdx,:),C(iCluster,:)).^2,2));
%# store the index into X (among all the points)
closestIdx(iCluster) = currentPointIdx(minIdx);
end
To get the coordinates of the point that is closest to the cluster center k
, use
X(closestIdx(k),:)