I have printed output following data:
dd={2: [314, 334, 298, 316, 336, 325, 337, 344, 319, 323], 1: [749, 843, 831, 795, 769]}
I tried to cluster list elements of each keys(including 2 & 1) into 2 clusters using kmeans. Here is my code:
from scipy.cluster.vq import kmeans, vq
from collections import defaultdict
import numpy as np
dd={2: [314, 334, 298, 316, 336, 325, 337, 344, 319, 323], 1: [749, 843, 831, 795, 769]}
new_dd = defaultdict(list)
check_cluster_list = [len(x) for ii, x in dd.items()]
number_of_clusters = 2
if number_of_clusters > min(check_cluster_list):
print("Clusters cannot be larger than", min(check_cluster_list))
raise Exception(f"Clusters cannot be larger than {min(check_cluster_list)}")
for indx, (id, y) in enumerate(dd.items()):
cluster_dict = defaultdict(list)
codebook, _ = kmeans(np.array(y, dtype=float), number_of_clusters)
cluster_indices, _ = vq(y, codebook)
But I have to define different numbers of cluster for each key. For example:
key: 2 >> number_of_clusters=3
key: 1 >> number_of_clusters=2
My question is how do I cluster those two lists with different number_of_clusters, 3 and 2 respectively?
IIUC you just need to make it a list where you define number_of_clusters
for each key.
And then use zip to iterate over the dict and the list together.
number_of_clusters = [3, 2]
.
.
.
for num, (indx, (id, y)) in zip(number_of_clusters, enumerate(dd.items())):
cluster_dict = defaultdict(list)
codebook, _ = kmeans(np.array(y, dtype=float), num)
cluster_indices, _ = vq(y, codebook)
print(f"{codebook=}\n{cluster_indices=}\n")
codebook=array([319.4 , 298. , 337.75])
cluster_indices=array([0, 2, 1, 0, 2, 0, 2, 2, 0, 0])
codebook=array([771., 837.])
cluster_indices=array([0, 1, 1, 0, 0])