Search code examples
pythonscipydistributiondata-fitting

fit exponential cdf to data python?


I am trying to fit an exponential CDF to my data to see if it is a good fit/develop an equation from the fit, but am not sure how since I think scipy.stats fits the PDF, not the CDF. If I have the data below:

eta = [1,0.5,0.3,0.25,0.2];
q = [1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10];

How do I fit an exponential CDF to the data? Or how do find the distribution that fits the data the best?


Solution

  • You can define a general exp function, and use curve_fit from scipy.optimize:

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy.optimize import curve_fit
    
    def exp_func(x, a, b, c):
        return a * np.exp(-b * x) + c
    
    eta = np.array([1,0.5,0.3,0.25,0.2])
    cdf = np.array([1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10])
    popt, pcov = curve_fit(exp_func, eta, cdf)
    plt.plot(eta, cdf)
    plt.plot(eta, exp_func(eta, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
    plt.legend()
    plt.show()
    

    And you'll get an exp function which is very similar to your values: enter image description here

    From the fitted parameters, you can see the function is y=np.exp(-19.213 * x).

    * Update *

    If you want to make sure this is really a CDF function, you'll need to calculate the pdf (by taking the derivative):

    x = np.linspace(0, 1, 1000)
    cdf_fit = exp_func(x, *popt)
    cdf_diff = np.r_[cdf_fit[0], np.diff(cdf_fit)]
    

    You can do a sanity check:

    plt.plot(x, np.cumsum(cdf_diff))
    

    And then use scipy to fit the pdf to an exponent distribution:

    from scipy.stats import expon
    params = expon.fit(cdf_diff)
    pdf_fit = expon.pdf(x, *params)
    

    I must warn you the something doesn't sum up. pdf_fit doesn't align with cdf_diff. Maybe your CDF isn't a real distribution function? The last value of a CDF should be 1.