I have this simple kmeans algorithm that I apply on a list of float lists :
def clustering(k,lists_to_cluster):
max_vals = [max(sublist) for sublist in lists_to_cluster]
kmeans_ampl = KMeans(k, random_state=123).fit(np.array(max_vals).reshape(-1,1))
centroids_ampl = kmeans_ampl.labels_
return centroids_ampl
centroids_labels = clustering(3,lists_to_cluster)
centroids_labels returns [0,0,1,2,2,0]
but the lists with highest mex_vals are labeled 0. I'd like to cluster labels to be sorted in a max_vals ascending order (label 0 is assigned to the lists with lowest max_vals, etc until label k-1 with highest max_vals).
Is there a way to do it before/during applying kmeans or should I just sort and map after applying it ?
Thanks !
You can group the maxvals by cluster into a dictionary that maps cluster label to list of maxvals.
Then sort the dictionary values (the lists) by min maxval, or max maxval, or whatever.
def relabel(labels, vals):
d = {}
for k, v in zip(labels, vals):
d.setdefault(k, []).append(v)
return list(enumerate(sorted(d.values(), key=min))) # or key=max, or key=statistics.mean
lists_to_cluster = [[1], [2], [3], [6], [7], [8], [101], [102], [103]]
max_vals = [max(sublist) for sublist in lists_to_cluster]
centroids_labels = clustering(3,lists_to_cluster)
print( relabel(centroids_labels, max_vals) )
# [(0, [1, 2, 3]), (1, [6, 7, 8]), (2, [101, 102, 103])]