Search code examples
pythonregressionpolynomial-math

How to fit a polynomial (using np.polyfit or something else) under conditions on intercept?


I want to write a generic function that takes as input two 1-D arrays and an integer N and return the most likelihood polynomial of degree N that fits my data. I want this polynomial to be of zero intercept. I already wrote this:

def calibrate_polynomial(cv, opening, N): #cv and opening are list of data, N is the degree of the desired polynome
print(np.polyfit(cv, opening, N))

I want to add a condition on the intercept. Does it exists with np.polyfit (it seems not since i read the notice) or do you have another function to do this? Thank you


Solution

  • The answer already present here mentions scipy.optimize.curve_fit. This is what I was going to suggest as well, however I don't think setting boundaries is really the best way to do what you want to do. If you want the y-intercept set to zero, rather than setting boundaries, you should simply define the equations you are using explicitely. For example, for a linear fit:

    Bpopt, Bpcov = curve_fit(lambda x,a: a*x, cv, opening, p0 = 1)
    

    Rather than using ax+b, you just use ax. Equations can similarly be defined to pass through (0,0) for each polynomial you want to fit.

    One thing that wasn't in the question that I feel like I should point out just in case, is that you can't directly compare polynomials of different degrees. An equation with more degrees of freedom will pretty much always be a better fit to data than an equation with fewer degrees of freedom. If you want to compare, for instance, a linear to a quadratic fit, you can't just fit the lines and take the one with lowest error. The quadratic fit will almost always win. To make the comparison, some sort of hypothesis testing has to be performed. You have to assume that the lower degree polynomial is really the best fit and take the higher degree fit as your alternate hypothesis. There are a lot of ways to do this. I'm personally partial to bootstrapping. Bootstrapping, as a confidence test, works by making a bunch of datasets by randomly sampling, with replacement, from your actual dataset, then seeing if the fit for your actual dataset is in the top 5% of fits for your randomly generated datasets. In this way, you can sort of correct for the fact that a higher degree equation will be a better fit for any random dataset than a lower degree equation. I can post some code if you want it but, no matter how you do it, you need a confidence test to compare fits to polynomials of different degrees.