I'm trying to fit my data with the following function:
The data that I'm using is the following:
X1:
0 1.0
1 101.0
2 201.0
3 301.0
4 401.0
5 501.0
6 601.0
7 701.0
8 801.0
9 901.0
10 1001.0
11 1101.0
12 1201.0
13 1301.0
14 1401.0
15 1501.0
16 1601.0
17 1701.0
18 1801.0
19 1901.0
Y1:
0 0.121159
1 0.195525
2 0.167305
3 0.125499
4 0.094138
5 0.071610
6 0.053848
7 0.039890
8 0.031099
9 0.023976
10 0.018271
11 0.013807
12 0.010596
13 0.008033
14 0.006710
15 0.005222
16 0.004299
17 0.003376
18 0.002976
19 0.002659
My code to call the function looks as follows:
def logN(X1, mu, SD1):
return A / X1 * np.exp(-0.5 * (np.log(X1 / mu) ** 2 / np.log(SD1) ** 2))
params, pcov = curve_fit(logN, X1, Y1)
print(params)
plt.plot(X1, Y1, "o")
plt.plot(X1, logN(X1, params[0], params[1]))
plt.show()
The result of this function shows parameters equal to 1 and I got the following warning:
minpack.py:829: OptimizeWarning: Covariance of the parameters could not be estimated
I'm wondering if I'm calling the function properly or the syntax of my function is wrong. Some ideas?
You are facing multiple challenges:
scipy.optimize.curve_fit
)A
coefficient in optimization process but it is explicitly stated in your function (thus it takes a global value that is not detailed in your post) and this A
coefficient is related to sigma
as the former incorporate the later.x=1
seems suspect) and there is no estimation of y
uncertainty. This may prevent proper convergence when performing parameters optimization and then algorithm fails to compute the covariance matrix.Improvement proposal:
It is possible to rewrite your problem into a classical OLS involving a second order polynomial. Then we do not have to rely on NLLS algorithms. Just apply the log-log transform to your relation to confirm it is bearable and get parameters conversion formula. Always prefer OLS before NLLS if available.
Remove or penalize (weights) suspect points, ideally using an objective criterion.
Adapt your model function (not considered here).
Based on data you provided:
import io
import numpy as np
from scipy import optimize
import pandas as pd
import matplotlib.pyplot as plt
data = io.StringIO("""id;x;y;sy
0;1.0;0.121159;1
1;101.0;0.195525;1
2;201.0;0.167305;1
3;301.0;0.125499;1
4;401.0;0.094138;1
5;501.0;0.071610;1
6;601.0;0.053848;1
7;701.0;0.039890;1
8;801.0;0.031099;1
9;901.0;0.023976;1
10;1001.0;0.018271;1
11;1101.0;0.013807;1
12;1201.0;0.010596;1
13;1301.0;0.008033;1
14;1401.0;0.006710;1
15;1501.0;0.005222;1
16;1601.0;0.004299;1
17;1701.0;0.003376;1
18;1801.0;0.002976;1
19;1901.0;0.002659;1
""")
df = pd.read_csv(data, sep=";", index_col="id")
Rewriting your model function to:
def func(x, A, mu, sigma):
return (A/x)*np.exp(-((np.log(x/mu)/np.log(sigma))**2)/2)
Then we can naively fit the function by providing data and smart enough initial condition to the optimization algorithm:
popt, pcov = optimize.curve_fit(func, df.x, df.y, sigma=df.sy,
p0=(50, 100, 0.1), method="lm")
But the result is not very satisfactory (unweighted):
And prone to variation because of suspect points (penalizing x=1
with w=100
):
Thus having uncertainty on y
measurements could help to adjust the fit.
Anyway as the problem can be linearized we should rely on this property, weights can be used in OLS as well.
You can perform OLS with scipy.optimize.least_squares
if you wish. I will use the sklearn
framework which is very handy:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
Lets remove the first suspect point:
df = df.loc[1:,:]
Then, we adapt inputs and perform log-log transformation:
X = np.log(df.x.values).reshape(-1, 1)
y = np.log(df.y)
We create the OLS pipeline for a second order polynomial:
poly = PolynomialFeatures(2)
linreg = LinearRegression()
model = make_pipeline(poly, linreg)
And finally we adjust the model to data:
model.fit(X, y)
model.score(X, y) # 0.9982242621455882
It leads to:
Which seems a reasonable adjustment for a quadratic. Then it is just about converting back coefficients into your desired quantities.