Search code examples
pythonk-meanspca

The new prediction label is always changing with PCA after Kmean


I am facing a problem that I am trying to do a Kmean clustering after PCA, when I want to cluster a new data, the prediction label is always changing (i.e. [2] to [3] to [1]....)

     #X is pre-defined dataset
        pca = PCA(n_components=2)
        reduced_data = pca.fit_transform(X)
        kmeans = KMeans(n_clusters=4)
        kmeans.fit_transform(reduced_data)

for filename in os.listdir(directoryName):
    if filename.endswith('.wav'): 
        (fs,rate)=wav.read(directoryName + "/" +filename)
        mfcc_feat = mfcc(rate,fs,nfft=1200)
        fbank_feat = logfbank(rate,fs,nfft=1200)
        features = mean_features(mfcc_feat)
        reduced_data = pca.transform([features])
        y = kmeans.predict(reduced_data)
        print (y)

And the output is:

[1]
[1]
[1]

But when I ran the code second time without modification:

[2]
[2]
[2]

And it keeps changing all the time


Solution

  • Problem seems to have with your code is that k-means are initializing centroid randomly every time that's why your results are getting varied. To fix it, do have a look at below given piece of code:

    kmeans = KMeans(n_clusters=n, random_state=42)
    

    You can specify any value for the random_state parameter. It makes your result reproducible.