Search code examples
pythonscipycurve-fittingconfidence-intervalscipy-optimize

Prediction Intervals for Hyperbolic Curve_Fit - SciPy


How can I get prediction intervals / prediction bands with a SciPy's curve fitting function?

More specifically, how do I get these prediction bands for the hyperbolic curve typically used in decline curve analysis?

Any help would be appreciated.

import pandas as pd
import numpy as np
from datetime import timedelta
from scipy.optimize import curve_fit

def hyperbolic_equation(t, qi, b, di):
    return qi/((1.0+b*di*t)**(1.0/b))


df1 = pd.DataFrame({ 'cumsum_days': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
        'prod': [800, 900, 1200, 700, 600, 
                 550, 500, 650, 625, 600,
                 550, 525, 500, 400, 350]})

qi = max(df1['prod'])

#Hyperbolic curve fit the data to get best fit equation
popt_hyp, pcov_hyp = curve_fit(hyperbolic_equation, df1['cumsum_days'], df1['prod'],bounds=(0, [qi,1,20]))

#Passing t to estimate the coefficients:

def fitted_hyperbolic_equation(t):
    return popt_hyp[0]/((1.0+popt_hyp[1]*popt_hyp[2]*t)**(1.0/popt_hyp[1]))

#Creating future time to predict on:
df2 = pd.DataFrame({ 'future_days': [16,17,18,19,20]})

fitted_hyperbolic_equation(df2.future_days)

16    388.259631
17    368.389649
18    349.754534
19    332.264306
20    315.836485

I have my future values, but how do I generate confidence / prediction bands (95%) with SciPy? Any help would be appreciated.


Solution

  • I'm not sure I fully understand, but I think you are asking for uncertainties in the predicted values from a curve fitting model.

    I would recommend using lmfit for this (disclaimer: I am an author) as it provides methods to do such calculations. I'm afraid your model and data do not match very well so that the uncertainties are very large

    Using lmfit and using plain numpy array instead of pandas dataframes (those can be used, but they're a distraction here - the fits really need numpy arrays), your analysis might look like this:

    import numpy as np
    from lmfit import Model
    import matplotlib.pyplot as plt
    
    cumsum_days = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
    prod = np.array([800, 900, 1200, 700, 600, 550, 500, 650, 625, 600, 550,
                     525, 500, 400, 350])
    
    # plot data
    plt.plot(cumsum_days, prod, 'bo', label='data')
    
    def hyperbolic_equation(t, qi, b, di):
        return qi/((1.0+b*di*t)**(1.0/max(b, 1.e-50)))
    
    # build Model
    hmodel = Model(hyperbolic_equation)
    
    # create lmfit Parameters, named from the arguments of `hyperbolic_equation`
    # note that you really must provide initial values.
    params = hmodel.make_params(qi=1000, b=0.5, di=0.1)
    
    # set bounds on parameters
    params['qi'].min=0
    params['b'].min=0
    params['di'].min=0
    
    # do fit, print resulting parameters
    result = hmodel.fit(prod, params, t=cumsum_days)
    print(result.fit_report())
    
    # plot best fit: not that great of fit, really
    plt.plot(cumsum_days, result.best_fit, 'r--', label='fit')
    
    # calculate the (1 sigma) uncertainty in the predicted model
    # and plot that as a confidence band
    dprod = result.eval_uncertainty(result.params, sigma=1)   
    plt.fill_between(cumsum_days,
                     result.best_fit-dprod,
                     result.best_fit+dprod,
                     color="#AB8888",
                     label='uncertainty band of fit')
    
    # now evaluate the model for other values, predicting future values
    future_days = np.array([16,17,18,19,20])
    future_prod = result.eval(t=future_days)
    
    plt.plot(future_days, future_prod, 'k--', label='prediction')
    
    # ...and calculate the 1-sigma uncertainty in the future prediction
    # for 95% confidence level, you'd want to use `sigma=2` here:
    future_dprod = result.eval_uncertainty(t=future_days, sigma=1)
    
    print("### Prediction\n# Day  Prod     Uncertainty")
    
    for day, prod, eps in zip(future_days, future_prod, future_dprod):
        print(" {:.1f}   {:.1f} +/- {:.1f}".format(day, prod, eps))
    
    plt.fill_between(future_days,
                     future_prod-future_dprod,
                     future_prod+future_dprod,
                     color="#ABABAB",
                     label='uncertainty band of prediction')
    
    plt.legend(loc='lower left')
    plt.show()
    

    This will print out resulting fit statistics and parameter values of

    [[Model]]
        Model(hyperbolic_equation)
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 21
        # data points      = 15
        # variables        = 3
        chi-square         = 238946.482
        reduced chi-square = 19912.2068
        Akaike info crit   = 151.139170
        Bayesian info crit = 153.263321
    [[Variables]]
        qi:  993.608482 +/- 163.710950 (16.48%) (init = 1000)
        b:   0.22855837 +/- 2.07615175 (908.37%) (init = 0.5)
        di:  0.06551315 +/- 0.06250023 (95.40%) (init = 0.1)
    [[Correlations]] (unreported correlations are < 0.100)
        C(b, di)  =  0.963
        C(qi, di) =  0.888
        C(qi, b)  =  0.771
    ### Prediction
    # Day  Prod     Uncertainty
     16.0   388.258 +/- 1080.106
     17.0   368.387 +/- 1106.336
     18.0   349.752 +/- 1130.091
     19.0   332.261 +/- 1151.634
     20.0   315.833 +/- 1171.196
    

    and give a plot like this:

    enter image description here

    In your question, you did not inspect the quality of the fit either statistically or graphically. Really, you will want to do this.

    You also used curve_fit without providing initial values. Although no underlying fitting routine will ever support that and all require explicit initial values, curve_fit permits this without warning or justification and asserts that all starting values will be 1.0. Really, you have to provide initial values.