python scipy curve-fitting scipy-optimize

How to do curve fitting of a curve family with python/scipy

I have a to solve the following problem:

I want to fit a family of curves with one equation (see Code)
Fitted Parameters (C1-C4) should be constants
So I can fit (or describe) one curve of the family with just changing "Sigma and T"

I hope I could describe my problem, hope you guys can help, I would be very grateful!

Question Edited (Because of misunderstanding - 2020_04_04)

I’ll try to be more specific now, that for I have attached a picture where you can see an example of the “curve family” which changes for different “Sigma”. I want to describe those curve family with a pair of constants – C1, C2, C3 and C4 without changing them. The clue is to find an optimum of constants which can describe this curve family with just changing Sigma and T as variables. Therefor I have to fit the parameters for a bunch of curves with a minimum of error. Afterwards the equation should cover the whole family of curves by just changing “Sigma and T”.

Example of Curves

Best Regards!

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit


#Equation --> Eps_Cr = (C1*Sigma**C2*x**(C3+1)*e(-C4/T))/(C3+1)

def func(x, C1, C2, C3,C4):
    Sigma = 20
    T = 1
    return (C1*Sigma**C2*x**(C3+1)*np.exp(-C4*1/T))/(C3+1)

#Example Data 1
xdata = [1, 10, 100, 1000, 10000, 100000]
ydata = [0.000382,0.000407,0.000658,0.001169,0.002205,0.004304]

#Example Data 2
xdata1 = [1, 10, 100, 1000, 10000, 100000]
ydata1 = [0.002164,0.002371,0.004441,0.008571,0.016811,0.033261]

#Example Data 3
xdata2 = [1, 10, 100, 1000, 10000, 100000]
ydata2 = [0.001332,0.001457,0.002707,0.005157,0.010007,0.019597]

plt.plot(xdata, ydata, 'b-', label='data')
plt.plot(xdata1, ydata1, 'g-', label='data')
plt.plot(xdata2, ydata2, 'y-', label='data')

popt, pcov = curve_fit(func, xdata, ydata)

plt.plot(xdata, func(xdata, *popt), 'r--',
         label='fit: C1=%5.2e, C2=%5.3f, C3=%5.3f,C4=%5.3f' % tuple(popt))


plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Solution

From the extra information you provided in the 'answer', it seems that you want to fit a hierarchical model. At least that is what statisticians often call them. Some parameters are shared between all data points (the parameters C1 to C4, and some parameters are shared within the groups of datasets (T and Sigma). All these parameters need to be estimated from the data.

This is often tackeled by building a larger model for all the data, and in the model one select which of the groupwise parameters to use. If a data points belong to data group 1 we choose Sigma1 and T1 and so on...

Since you are already using curve_fit, I made a version of your code that does the job. The code style leaves a bit to ask for since I'm no expert in scipy, but I think that you will understand the method at least.

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit

def func(x_and_grp, C1, C2, C3, C4, Sigma0, Sigma1, Sigma2, T0, T1, T2):
    # We estimate one sigma and one T per group of data points
    x = x_and_grp[:,0]
    grp_id = x_and_grp[:,1]
    # here we select the appropriate T and Sigma for each data point based on their group id
    T = np.array([[T0, T1, T2][int(gid)] for gid in grp_id])
    Sigma = np.array([[Sigma0, Sigma1, Sigma2][int(gid)] for gid in grp_id])
    return (C1*Sigma**C2*x**(C3+1)*np.exp(-C4*1/T))/(C3+1)

#Example Data in 3 groups
xdata0 = [1, 10, 100, 1000, 10000, 100000]
ydata0 = [0.000382,0.000407,0.000658,0.001169,0.002205,0.004304]
xdata1 = [1, 10, 100, 1000, 10000, 100000]
ydata1 = [0.002164,0.002371,0.004441,0.008571,0.016811,0.033261]
xdata2 = [1, 10, 100, 1000, 10000, 100000]
ydata2 = [0.001332,0.001457,0.002707,0.005157,0.010007,0.019597]

#  merge all the data and add the group id to the x-data vectors
y_all = np.concatenate([ydata0, ydata1, ydata2])
x_and_grp_all = np.zeros(shape=(3 * 6, 2))
x_and_grp_all[:, 0] = np.concatenate([xdata0, xdata1, xdata2])
x_and_grp_all[0:6, 1] = 0
x_and_grp_all[6:12, 1] = 1
x_and_grp_all[12:18, 1] = 2

# fit a model to all the data together
popt, pcov = curve_fit(func, x_and_grp_all, y_all)

xspace = np.logspace(1,5)
plt.plot(xdata0, ydata0, 'b-', label='data')
plt.plot(xdata1, ydata1, 'g-', label='data')
plt.plot(xdata2, ydata2, 'y-', label='data')
for gid,color in zip([0,1,2],['r','k','purple']):
    T = popt[4+gid]
    Sigma = popt[7+gid]
    x_and_grp = np.column_stack([xspace,np.ones_like(xspace)*gid])
    plt.plot(xspace,
             func(x_and_grp, *popt),
             linestyle='dashed', color=color,
             label='fit: T=%5.2e, Sigma=%5.3f' % (T,Sigma))

plt.xlabel('X')
plt.ylabel('Y')
plt.title('fit: C1=%5.2e, C2=%5.3f, C3=%5.3f,C4=%5.3f' % tuple(popt[0:4]))
plt.legend()
plt.show()

The output looks like this:

Finally, I want to add that curve_fit is not that well suited for this task if you have a lot of different groups. Consider some other library that could be relevant. Statmodels could be possible. One alternative is to reach for scipy.optimize.minimze instead, since it gives you more flexibility. You need to do the confidence interval estimation manually though...

I also want to add that the method above is overly complicated if you know the T and Sigma for each group of data. In that case we add the relevant value of Sigma and T to the x-vector, instead of a group id.