Search code examples
pythonscikit-learncluster-analysisk-means

How to print result of clustering in sklearn


I have a sparse matrix

from scipy.sparse import *
M = csr_matrix((data_np, (rows_np, columns_np)));

then I'm doing clustering that way

from sklearn.cluster import KMeans
km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1, verbose=1)
km.fit(M)

and my question is extremely noob: how to print the clustering result without any extra information. I don't care about plotting or distances. I just need clustered rows looking that way

Cluster 1
row 1
row 2
row 3

Cluster 2
row 4
row 20
row 1000
...

How can I get it? Excuse me for this question.


Solution

  • Time to help myself. After

    km.fit(M)
    

    we run

    labels = km.predict(M)
    

    which returns labels, numpy.ndarray. Number of elements in this array equals number of rows. And each element means that a row belongs to the cluster. For example: if first element is 5 it means that row 1 belongs to cluster 5. Lets put our rows in a dictionary of lists looking this way {cluster_number:[row1, row2, row3], ...}

    # in row_dict we store actual meanings of rows, in my case it's russian words
    clusters = {}
        n = 0
        for item in labels:
            if item in clusters:
                clusters[item].append(row_dict[n])
            else:
                clusters[item] = [row_dict[n]]
            n +=1
    

    and print the result

    for item in clusters:
        print "Cluster ", item
        for i in clusters[item]:
            print i