I am working on a K-Means Clustering task and I am wondering if there is some way to do some kind of ranking of clusters, or maybe assign specific weights to some specific clusters. Is there a way to do this? Here is my code.
from pylab import plot,show
from numpy import vstack,array
from numpy.random import rand
import numpy as np
from scipy.cluster.vq import kmeans,vq
import pandas as pd
import pandas_datareader as dr
from math import sqrt
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
df = pd.read_csv('C:\\my_path\\analytics.csv')
data = np.asarray([np.asarray(dataset['Rating']),np.asarray(dataset['Maturity']),np.asarray(dataset['Score']),np.asarray(dataset['Bin']),np.asarray(dataset['Price1']),np.asarray(dataset['Price2']),np.asarray(dataset['Price3'])]).T
centroids,_ = kmeans(data,1000)
idx,_ = vq(data,centroids)
details = [(name,cluster) for name, cluster in zip(dataset.Cusip,idx)]
So, I get my 'details', I look at it, and everything seems fine at this point. I end up with around 700 clusters. I'm just wondering if there is a way to rank-order these clusters, assuming 'Rating' is the most important feature. Or, perhaps there is a way to assign a higher weight to 'Rating'. I'm not sure this makes 100% sense. I'm just thinking about the concept and wondering if there is some obvious solution or maybe this is just nonsense. I can easily count the records in each cluster, but I don't think that has any significance whatsoever. I Googled this and didn't find anything useful.
One "cheat" trick would be to use the feature rating
twice or three times, then it automatically gets more weight:
data = np.asarray([np.asarray(dataset['Rating']), np.asarray(dataset['Rating']), np.asarray(dataset['Maturity']),np.asarray(dataset['Score']),np.asarray(dataset['Bin']),np.asarray(dataset['Price1']),np.asarray(dataset['Price2']),np.asarray(dataset['Price3'])]).T
there are also adjustments of kmeans around, but they are not implemented in python.