Search code examples
pythoncurve-fittingspline

Fit curve to data, get analytical form, & check when curve crosses threshold


I have 40 points for each curve and I would like to smooth the function and estimate when the curve crosses a threshold on the y axis. Is there a fitting function that I can easily apply this to, I can use interpolate to plot the new function but I can't figure out how to request the x value for which y = threshold.

Unfortunately the curves don't all have the same shape so I can't use scipy.optimize.curve_fit.

Thanks!

Update: Adding two curves:

Curve 1

[942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688]

Curve 2

[-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980]

x values are

[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]

Solution

  • For fitting a smooth curve, you can fit Legendre polynomials using numpy.polynomial.legendre.Legendre's fit method.


    # import packages we need later
    import matplotlib.pyplot as plt
    import numpy as np
    

    Fitting Legendre Polynomials

    Preparing data as numpy arrays:

    curve1 = \
    np.asarray([942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688])
    curve2 = \
    np.asarray([-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980])
    xvals = \
    np.asarray([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40])
    

    Lets fit the Legendre polynomials, degree being the highest degree polynomial used, the first few is here for example.

    degree=10
    legendrefit_curve1 = np.polynomial.legendre.Legendre.fit(xvals, curve1, deg=degree)
    legendrefit_curve2 = np.polynomial.legendre.Legendre.fit(xvals, curve2, deg=degree)
    

    Calculate these fitted curves at evenly spaced points using the linspace method. n is the number of point pairs we want to have.

    n=100
    fitted_vals_curve1 = legendrefit_curve1.linspace(n=n)
    fitted_vals_curve2 = legendrefit_curve2.linspace(n=n)
    

    Let's plot the result, along with a threshold (using axvline):

    plt.scatter(xvals, curve1)
    plt.scatter(xvals, curve2)
    
    plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
    plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')
    
    threshold=100000
    plt.axhline(y=threshold)
    

    enter image description here

    The curves fit beautifully.


    When is the threshold crossed?

    To check where the threshold is crossed in each series, you can do:

    for x, y in zip(fitted_vals_curve1[0], fitted_vals_curve1[1]):
        if y > threshold:
            xcross_curve1 = x
            break
    
    for x, y in zip(fitted_vals_curve2[0], fitted_vals_curve2[1]):
        if y > threshold:
            xcross_curve2 = x
            break
    

    xcross_curve1 and xcross_curve2 will hold the x value where curve1 and curve2 crossed the threshold if they did cross the threshold; if they did not, they will be undefined.

    Let's plot them to check if it works (link to axhline docs):

    plt.scatter(xvals, curve1)
    plt.scatter(xvals, curve2)
    
    plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
    plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')
    
    plt.axhline(y=threshold)
    
    try: plt.axvline(x=xcross_curve1)
    except NameError: print('curve1 is not passing the threshold',c='b')
    
    try: plt.axvline(x=xcross_curve2)
    except NameError: print('curve2 is not passing the threshold')
    

    As expected, we get this plot:

    enter image description here

    (and a text output: curve2 is not passing the threshold.)

    If you would like to increase accuracy of xcross_curve1 or xcross_curve2, you can increase degree and n defined above.


    From Legendre to Polynomial form

    We have fitted a curve, which roughly has the form:

    enter image description here where P_n is the nth Legendre polynomial, s(x) is some function which transforms x to the range P_n expects (some math stuff which we don't need to know now).

    We want our fitted line in the form:

    enter image description here

    We'll use legendre() of scipy.special:

    from scipy.special import legendre
    

    We'll also use use np.pad (docs, good SO post).

    legendredict={}
    for icoef, coef in enumerate(legendrefit_curve1.coef):
        legendredict[icoef]=coef*np.pad(legendre(icoef).coef,(10-icoef,0),mode='constant')
    

    legendredict will hold keys from 0 to 10, and each value in the dict will be a list of floats. The key is refering to the degree of the Polynomial, and the list of floats are expressing what are the coefficients of x**n values within that constituent polynomial of our fit, in a backwards order.

    For example:

    P_4 is:

    enter image description here

    legendredict[4] is:

    isarray([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
            0.00000000e+00,  0.00000000e+00,  3.29634565e+05,  3.65967884e-11,
           -2.82543913e+05,  1.82983942e-11,  2.82543913e+04])
    

    Meaning that in the sum of P_ns (f(x), above), we have q_4 lot of P_4, which is equivalent to having 2.82543913e+04 of 1s, 1.82983942e-11 of x, -2.82543913e+05 of x^2, etc, only from the P_4 component.

    So if we want to know how much 1s, xs, x^2s, etc we need to form the polynomial sum, we need to add the need for 1s, xs, x^2s, etcs from all the different P_ns. This is what we do:

    polycoeffs = np.sum(np.stack(list(legendredict.values())),axis=0)
    

    Then let's form a polynomial sum:

    for icoef, coef in enumerate(reversed(polycoeffs)):
        print(str(coef)+'*s(x)**'+str(icoef),end='\n +')
    

    Giving the output:

    -874.1456709637822*s(x)**0
     +2893.7228005540596*s(x)**1
     +50415.38472217957*s(x)**2
     +-6979.322584205707*s(x)**3
     +-453363.49985790614*s(x)**4
     +-250464.7549807652*s(x)**5
     +1250129.5521521813*s(x)**6
     +1267709.5031024509*s(x)**7
     +-493280.0177807359*s(x)**8
     +-795684.224334346*s(x)**9
     +-134370.1696946264*s(x)**10
     +
    

    (We are going to ignore the last + sign, formatting is not the main point here.)

    We need to calculate s(x) as well. If we are working in a Jupyter Notebook / Google Colab, only executing a cell with legendrefit_curve1 returns:

    enter image description here

    From where we can clearly see that s(x) is -1.0512820512820513+0.05128205128205128x. If we want to do it in a more programmatic way:

    2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) is 0.05128205128205128 & -1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) is just -1.0512820512820513

    Which is true for some mathamatical reasons not much relevant here (related Q).

    So we can define:

    def s(input):
        a=-1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
        b=2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
        return a+b*input
    

    Also, lets define, based on the above obtained sum of polynomials of s(x):

    def polyval(x):
        return -874.1456709637822*s(x)**0+2893.7228005540596*s(x)**1+50415.38472217957*s(x)**2+-6979.322584205707*s(x)**3+-453363.49985790614*s(x)**4+-250464.7549807652*s(x)**5+1250129.5521521813*s(x)**6+1267709.5031024509*s(x)**7+-493280.0177807359*s(x)**8+-795684.224334346*s(x)**9+-134370.1696946264*s(x)**10
    

    In a more programmatic way:

    def polyval(x):
        return sum([coef*s(x)**icoef for icoef, coef in enumerate(reversed(polycoeffs))])
    

    Check that our polynomial indeed fits:

    plt.scatter(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
    plt.plot(fitted_vals_curve1[0],[polyval(val) for val in fitted_vals_curve1[0]])
    

    It does:

    enter image description here

    So let's print out our pure polynomial sum, with s(x) being replaced by an explicit function:

    for icoef, coef in enumerate(reversed(polycoeffs)):
        print(str(coef)+'*(-1.0512820512820513+0512820512820513*x)**'+str(icoef),end='\n +')
    

    Giving the output:

    -874.1456709637822*(-1.0512820512820513+0512820512820513*x)**0
     +2893.7228005540596*(-1.0512820512820513+0512820512820513*x)**1
     +50415.38472217957*(-1.0512820512820513+0512820512820513*x)**2
     +-6979.322584205707*(-1.0512820512820513+0512820512820513*x)**3
     +-453363.49985790614*(-1.0512820512820513+0512820512820513*x)**4
     +-250464.7549807652*(-1.0512820512820513+0512820512820513*x)**5
     +1250129.5521521813*(-1.0512820512820513+0512820512820513*x)**6
     +1267709.5031024509*(-1.0512820512820513+0512820512820513*x)**7
     +-493280.0177807359*(-1.0512820512820513+0512820512820513*x)**8
     +-795684.224334346*(-1.0512820512820513+0512820512820513*x)**9
     +-134370.1696946264*(-1.0512820512820513+0512820512820513*x)**10
     +
    

    Which can be simplified, as desired. (Ignore the last + sign.)

    If want a higher (lower) degree polynomial fit, just fit higher (lower) degrees of Legendre polynomials.