I am facing a problem that I am trying to do a Kmean clustering after PCA, when I want to cluster a new data, the prediction label is always changing (i.e. [2] to [3] to [1]....)
#X is pre-defined dataset
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(X)
kmeans = KMeans(n_clusters=4)
kmeans.fit_transform(reduced_data)
for filename in os.listdir(directoryName):
if filename.endswith('.wav'):
(fs,rate)=wav.read(directoryName + "/" +filename)
mfcc_feat = mfcc(rate,fs,nfft=1200)
fbank_feat = logfbank(rate,fs,nfft=1200)
features = mean_features(mfcc_feat)
reduced_data = pca.transform([features])
y = kmeans.predict(reduced_data)
print (y)
And the output is:
[1]
[1]
[1]
But when I ran the code second time without modification:
[2]
[2]
[2]
And it keeps changing all the time
Problem seems to have with your code is that k-means are initializing centroid randomly every time that's why your results are getting varied. To fix it, do have a look at below given piece of code:
kmeans = KMeans(n_clusters=n, random_state=42)
You can specify any value for the random_state parameter. It makes your result reproducible.