Search code examples
pythonscipycurve-fittingleast-squares

scipy polyfit x, y , weights =error bars


I would like to fit a line that uses the inverse of the error bars as weights.

I binned my data x and y, into 10 bins ( ~26 points each ) and took their mean. This is not a matrix of values so polyfit isn't happy with that.

# note I didn't test this pseudo code....
import numpy as np
import scipy

x = np.random.randn(100)
y = np.random.rand(100)

x_bins = np.linspace(x.min(), x.max(), 10) # reduction of data into 10 bins 
y_bins = np.linspace(y.min(), y.max(), 10) 

x_bin = np.digitize(x, x_bins)
y_bin = np.digitize(y, y_bins)

x_mu = np.zeros(10)
y_mu = x_mu.copy()
err = x_mu.copy()   

for i in range(10):
    x_mu = np.mean(x[x_bin==i]) 
    y_mu = np.mean(y[y_bin==i])
    err = np.std([y[y_bin==i])


x_mu[np.isnan(x_mu)] = 0
y_mu[np.isnan(y_mu)] = 0
errror[np.isnan(error)] = 0

plt.errorbar(x_mu, y_mu, err, fmt='o')

EDIT: scipy.polyfit stopped complaining about ill conditioned inputs...

out = scipy.polyfit(x_mu, y_mu, deg=1, w=error)

Solution

  • A numpy.polyfit does not allow you to explicitly specify uncertainties. Instead you could use scipy.optimize.curve_fit, e.g.

    import numpy as np
    import scipy
    import scipy.optimize
    
    x = np.linspace(0,1, 100)
    y = np.random.rand(100)
    
    # bin the data
    n, bins = np.histogram(y, 10, [0, 1])
    xb = bins[:-1] + 0.05  # at bin center; has overflow bin
    yb = n                 # just the per-bin counts
    err = sqrt(n)          # from Poisson statistics
    plt.errorbar(xb, yb, err, fmt='ro')
    
    # fit a polynomial of degree 1, no explicit uncertainty
    a1, b1 = np.polyfit(xb, yb, 1)
    plt.plot(xb, a1*xb + b1, 'b') 
    
    # fit explicitly taking uncertainty into account
    f = lambda x, a, b: a*x + b  # function to fit
    # fit with initial guess for parameters [1, 1]
    pars, corr = scipy.optimize.curve_fit(f, xb, yb, [1, 1], err)
    a2, b2 = pars
    plt.plot(xb, a2*xb + b2, 'r')
    

    enter image description here

    To properly interpret the fit you need to examine the correlation matrix of the fitted parameters, but that goes beyond this technical question.