Search code examples
pythonnumpygaussian

Gaussian Mixture Model with discrete data


I have 136 numbers which have an overlapping distribution of 8 Gaussian distributions. I want to find it's means, and variances with each Gaussian distribution! Can you find any mistakes with my code?

file = open("1.txt",'r') #data is in 1.txt like 0,0,0,0,0,0,1,0,0,1,4,4,6,14,25,43,71,93,123,194...

y=[int (i) for i in list((file.read()).split(','))] # I want to make list which element is above data

x=list(range(1,len(y)+1)) # it is x values

z=list(zip(x,y)) # z elements consist as (1, 0), (2, 0), ...

Therefore, through the above process, for the 136 points (x,y) on the xy plane having the first given data as y values, a list z using this as an element was obtained. Now I want to obtain each Gaussian distribution's mean, variance. At this time, the basic assumption is that the given data consists of overlapping 8 Gaussian distributions.

import numpy as np

from sklearn.mixture import GaussianMixture

data = np.array(z).reshape(-1,1)

model = GaussianMixture(n_components=8).fit(data)

print(model.means_)

file.close()

Actually, I don't know how to make it's code to print 8 means and variances... Anyone can help me?


Solution

  • You can use this, I have made a sample code for your visualizations -

    import numpy as np
    from sklearn.mixture import GaussianMixture
    import scipy
    import matplotlib.pyplot as plt
    %matplotlib inline
    
    #Sample data
    x = [0,0,0,0,0,0,1,0,0,1,4,4,6,14,25,43,71,93,123,194]
    num_components = 2
    
    #Fit a model onto the data
    data = np.array(x).reshape(-1,1)
    model = GaussianMixture(n_components=num_components).fit(data)
    
    #Get list of means and variances
    mu = np.abs(model.means_.flatten())
    sd = np.sqrt(np.abs(model.covariances_.flatten()))
    
    #Plotting
    extend_window = 50  #this is for zooming into or out of the graph, higher it is , more zoom out
    x_values = np.arange(data.min()-extend_window, data.max()+extend_window, 0.1) #For plotting smooth graphs
    plt.plot(data, np.zeros(data.shape), linestyle='None', markersize = 10.0, marker='o') #plot the data on x axis
    
    #plot the different distributions (in this case 2 of them)
    for i in range(num_components):
        y_values = scipy.stats.norm(mu[i], sd[i])
        plt.plot(x_values, y_values.pdf(x_values))
    
    

    enter image description here