Search code examples
matlabopencvcluster-analysisk-meansvlfeat

K-means Clustering, major understanding issue


Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.

Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:

[centers, assignments] = vl_kmeans(dt, 20);

centers is a 64x20 matrix.

assignments is a 1x150 matrix with values inside it.

According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.

I still can not understand what those numbers in the matrix assignments mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?


Solution

  • In k-means the problem you are trying to solve is the problem of clustering your 150 points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt is the set of points, each column is a 64-dim vector.

    After running the algorithm you get centers and assignments. centers are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt. So if assignments[7] is 15 it indicates that the 7th vector in dt belongs to the 15th cluster.

    For example here you can see clustering of lots of 2d points, let's say 1000 into 3 clusters. In this case dt would be 2x1000, centers would be 2x3 and assignments would be 1x1000 and will hold numbers ranging from 1 to 3 (or 0 to 2, in case you're using openCV)

    enter image description here

    EDIT: The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.