I am trying to fit an exponential CDF to my data to see if it is a good fit/develop an equation from the fit, but am not sure how since I think scipy.stats fits the PDF, not the CDF. If I have the data below:
eta = [1,0.5,0.3,0.25,0.2];
q = [1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10];
How do I fit an exponential CDF to the data? Or how do find the distribution that fits the data the best?
You can define a general exp function, and use curve_fit from scipy.optimize:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def exp_func(x, a, b, c):
return a * np.exp(-b * x) + c
eta = np.array([1,0.5,0.3,0.25,0.2])
cdf = np.array([1e-9,9.9981e-10,9.9504e-10,9.7905e-10,9.492e-10])
popt, pcov = curve_fit(exp_func, eta, cdf)
plt.plot(eta, cdf)
plt.plot(eta, exp_func(eta, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.legend()
plt.show()
And you'll get an exp function which is very similar to your values:
From the fitted parameters, you can see the function is y=np.exp(-19.213 * x).
* Update *
If you want to make sure this is really a CDF function, you'll need to calculate the pdf (by taking the derivative):
x = np.linspace(0, 1, 1000)
cdf_fit = exp_func(x, *popt)
cdf_diff = np.r_[cdf_fit[0], np.diff(cdf_fit)]
You can do a sanity check:
plt.plot(x, np.cumsum(cdf_diff))
And then use scipy to fit the pdf to an exponent distribution:
from scipy.stats import expon
params = expon.fit(cdf_diff)
pdf_fit = expon.pdf(x, *params)
I must warn you the something doesn't sum up. pdf_fit doesn't align with cdf_diff. Maybe your CDF isn't a real distribution function? The last value of a CDF should be 1.