Search code examples
pythonnumpyscipygmm

How to represent 1D vector as sum of Gaussian curves with scipy/numpy?


UPD: Thanks, it works.

I have an 1D-vector, which represents a histogram. It looks like sum of few gaussian functions: enter image description here

I've found curve_fit sample code on SO, but don't know how to modify it to receive more gaussian tuples (mu, sigma). I've heard 'curve_fit' optimizes only one function (in this case one gaussian curve).

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def estimate_sigma(hist):
    bin_edges = np.arange(len(hist))
    bin_centres = bin_edges + 0.5

    # Define model function to be used to fit to the data above:
    def gauss(x, *p):
        A, mu, sigma = p
        return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))

    # p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
    p0 = [1., 0., 1.]

    coeff, var_matrix = curve_fit(gauss, bin_centres, hist, p0=p0)

    # Get the fitted curve
    hist_fit = gauss(bin_centres, *coeff)

    plt.plot(bin_centres, hist, label='Test data')
    plt.plot(bin_centres, hist_fit, label='Fitted data')

    print 'Fitted mean = ', coeff[1]
    coeff2 =coeff[2]
    print 'Fitted standard deviation = ', coeff2

    plt.show()

This functions finds one gaussian curve, while visually there are 3 or 4 of them: enter image description here

Please, could you advice some numpy/scipy functions to achieve gmm representation of 1D vector in form ([m1, sigma1],[m2, sigma2],..,[mN,sigmaN])?


Solution

  • As tBuLi recommended, I passed additional Gaussian curves coefficients to gauss as well as to curve_fit. Now fitted curve looks so: enter image description here

    Updated code:

    def estimate_sigma(hist):
        bin_edges = np.arange(len(hist))
        bin_centres = bin_edges + 0.5
    
        # Define model function to be used to fit to the data above:
        def gauss(x, *gparams):
            g_count = len(gparams)/3
            def gauss_impl(x, A, mu, sigma):
                return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))
            res = np.zeros(len(x))
            for gi in range(g_count):
                res += gauss_impl(x, gparams[gi*3], gparams[gi*3+1], gparams[gi*3+2])
            return res
    
        # p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
        curves_count = 4
        p0 = np.tile([1., 0., 1.], curves_count)
    
        coeff, var_matrix = curve_fit(gauss, bin_centres, hist, p0=p0)
    
        # Get the fitted curve
        hist_fit = gauss(bin_centres, *coeff)
    
        plt.plot(bin_centres, hist, label='Test data')
        plt.plot(bin_centres, hist_fit, label='Fitted data')
    
        # Finally, lets get the fitting parameters, i.e. the mean and standard deviation:
        print coeff
    
        plt.show()