Search code examples
pythonhistogramcurve-fittingcovariancepcov

Possible to see estimated parameter array of curve_fit function used to calculate variance?


I am working with scipy's curve_fit function, and using pcov to calculate the parameter errors. I understand pcov produce the variance in the parameters which it produces. Variance is obviously calculated from a set of numbers, which are the estimates for the respective parameters. I am wondering if it is possible to produce this set of estimates so that I can see it and try to calculate the variance by hand. Additionally, I would like to plot a histogram from these values. Does anyone know how I could do this, or if there is a separate function which might be able to help?

Here is my current code:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2, 4, 9, 16, 25])

def function(x, a, b):
    return a * np.exp(-b*x)

popt, pcov = curve_fit(function, x_data, y_data)
a, b = popt

print("pcov =", pcov)

print("a +/- standard deviation =", popt[0], "+/-", pcov[0,0]**0.5)

print("b +/- standard deviation =", popt[1], "+/-", pcov[1,1]**0.5)

print("a +/- standard error =", popt[0], "+/-", ((pcov[0,0]**0.5)/np.sqrt(5)))

print("b +/- standard error =", popt[1], "+/-", ((pcov[1,1]**0.5)/np.sqrt(5)))

print("Confidence Interval:", CI)

plt.plot(x_data, function(x_data, *popt), "k--", label='Curve Fit')
plt.plot(x_data, y_data, 'o', label='Raw Data')
plt.xlabel("Time")
plt.ylabel("AB")
plt.legend()

which produces:

enter image description here

To reiterate, I want to produce the set of values used to calculate the variance of each parameter. Thank you!


Solution

  • If I correctly understand your OP, there are two questions:

    • How can I plot histogram of residuals?
    • How is the covariance matrix computed?

    Lets build a representative example:

    import numpy as np
    import numdifftools as nd
    import matplotlib.pyplot as plt
    from scipy import optimize, stats
    

    We create some data for your model with normal noise of 0.1 sigma:

    np.random.seed(123456)
    x = np.linspace(-1, 1, 3000)
    p = (2.5, 1.2)
    y = model(x, *p)
    s = 0.1 * np.ones_like(x)
    n = s * np.random.normal(size=x.size)
    yn = y + n
    

    Compute residuals distribution

    Method curve_fit can handle this fit without problem:

    popt, pcov = optimize.curve_fit(model, x, yn, sigma=s, absolute_sigma=True)
    #(array([2.50042509, 1.19873061]),
    # array([[ 5.22097958e-06, -2.50399907e-06],
    #        [-2.50399907e-06,  1.66936758e-06]]))
    

    From this adjustment, we can estimate the curve and the residuals:

    yhat = model(x, *popt)
    residuals = yn - yhat
    

    Adjustment looks like:

    fig, axe = plt.subplots()
    axe.plot(x, yn)
    axe.plot(x, yhat)
    axe.grid()
    

    enter image description here

    We can fit residuals against a normal distribution to check our result:

    p = stats.norm.fit(residuals)
    # (-3.506208994573257e-05, 0.10181543029512476)
    

    We have residuals centered at almost 0. with a sigma about 0.1 which was expected. Graphically it leads to:

    law = stats.norm(*p)
    rlin = np.linspace(residuals.min(), residuals.max(), 200)
    
    fig, axe = plt.subplots()
    axe.hist(residuals, 15, density=1.)
    axe.plot(rlin, law.pdf(rlin))
    axe.grid()
    

    enter image description here

    Estimate Covariance Matrix

    Now we will focus on how covariance matrix is estimated and what is the link with the residuals.

    Covariance matrix is estimated as the inverse of the Hessian of the loss function at the optimized solution.

    The curve_fit method use Chi Square Loss function and minimize it wrt to parameters knowing the dataset. We can rewrite such a function as follow:

    def loss_factory(x, y, s):
        def wrapped(p):
            return 0.5 * np.sum(np.power((y - model(x, *p)) / s, 2))
        return wrapped
    
    loss = loss_factory(x, yn, s)
    

    The goal is then:

    • Minimize the loss function wrt to p, knowing x, y and s;
    • Estimate the Hessian at the solution x0 which minimize the loss function;
    • Inverse the Hessian

    We minimize the loss:

    sol = optimize.minimize(loss, x0=[1., 1.], tol=1e-4)
    #  message: Optimization terminated successfully.
    #  success: True
    #   status: 0
    #      fun: 1554.9574613298098
    #        x: [ 2.500e+00  1.199e+00]
    #      nit: 11
    #      jac: [-9.155e-05 -4.578e-05]
    # hess_inv: [[ 5.208e-06 -2.497e-06]
    #            [-2.497e-06  1.666e-06]]
    #     nfev: 60
    #     njev: 20
    

    Solution object already offers an estimate of the inverse of the Hessian which complies with Covariance matrix provided by curve_fit.

    In case of need we can refine it we can do so using the numdifftools package:

    H = nd.Hessian(loss)(sol.x)
    C = np.linalg.inv(H)
    # array([[ 5.22047492e-06, -2.50366278e-06],
    #        [-2.50366278e-06,  1.66914356e-06]])
    

    Which also agrees with covariance matrix provided by curve_fit.