Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.
Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:
[centers, assignments] = vl_kmeans(dt, 20);
centers
is a 64x20 matrix.
assignments
is a 1x150 matrix with values inside it.
According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.
I still can not understand what those numbers in the matrix assignments
mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?
In k-means the problem you are trying to solve is the problem of clustering your 150
points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt
is the set of points, each column is a 64-dim vector.
After running the algorithm you get centers
and assignments
. centers
are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt
. So if assignments[7]
is 15
it indicates that the 7th vector in dt
belongs to the 15th cluster.
For example here you can see clustering of lots of 2d
points, let's say 1000
into 3
clusters. In this case dt
would be 2x1000
, centers
would be 2x3
and assignments would be 1x1000
and will hold numbers ranging from 1
to 3
(or 0
to 2
, in case you're using openCV
)
EDIT: The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.