Search code examples
cluster-analysisdata-mininggaussiangmm

Python sklearn- gaussian.mixture how to get the samples/points in each clusters


I am using the GMM to cluster my dataset to K Groups, my model is running well, but there is no way to get raw data from each cluster, Can you guys suggest me some idea to solve this problem. Thank you so much.


Solution

  • You can do it like this (look at d0, d1, & d2).

    import numpy as np 
    import pandas as pd 
    import matplotlib.pyplot as plt 
    from pandas import DataFrame 
    from sklearn import datasets 
    from sklearn.mixture import GaussianMixture 
    
    # load the iris dataset 
    iris = datasets.load_iris() 
    
    # select first two columns  
    X = iris.data[:, 0:2] 
    
    # turn it into a dataframe 
    d = pd.DataFrame(X) 
    
    # plot the data 
    plt.scatter(d[0], d[1]) 
    
    gmm = GaussianMixture(n_components = 3) 
    
    # Fit the GMM model for the dataset  
    # which expresses the dataset as a  
    # mixture of 3 Gaussian Distribution 
    gmm.fit(d) 
    
    # Assign a label to each sample 
    labels = gmm.predict(d) 
    d['labels']= labels 
    d0 = d[d['labels']== 0] 
    d1 = d[d['labels']== 1] 
    d2 = d[d['labels']== 2] 
    
    # here is a possible solution for you:
    d0
    d1
    d2
    
    # plot three clusters in same plot 
    plt.scatter(d0[0], d0[1], c ='r') 
    plt.scatter(d1[0], d1[1], c ='yellow') 
    plt.scatter(d2[0], d2[1], c ='g') 
    

    enter image description here

    # print the converged log-likelihood value 
    print(gmm.lower_bound_) 
    
    # print the number of iterations needed 
    # for the log-likelihood value to converge 
    print(gmm.n_iter_)
    
    # it needed 8 iterations for the log-likelihood to converge.