Search code examples

Get the cluster size in sklearn in python

I am using sklearn DBSCAN to cluster my data as follows.

#Apply DBSCAN (sims == my data as list of lists)
db1 = DBSCAN(min_samples=1, metric='precomputed').fit(sims)

db1_labels = db1.labels_
db1n_clusters_ = len(set(db1_labels)) - (1 if -1 in db1_labels else 0)
#Returns the number of clusters (E.g., 10 clusters)
print('Estimated number of clusters: %d' % db1n_clusters_)

Now I want to get the top 3 clusters sorted from the size (number of data points in each cluster). Please let me know how to obtain the cluster size in sklearn?


  • Another option would be to use numpy.unique:

    db1_labels = db1.labels_
    labels, counts = np.unique(db1_labels[db1_labels>=0], return_counts=True)
    print labels[np.argsort(-counts)[:3]]