Search code examples
pythonnumpyscipycurve-fittingnumerical-integration

Fitting data to a complementary error function with multiple variables in Python


I am having trouble to fit experimental data to a complementary error function in Python 3.7.4.

import matplotlib.pyplot as plt
import math 
import numpy as np
from scipy import optimize
from scipy import integrate

with open('C:Path\\Data\\test.txt', 'r') as f:
    lines = f.readlines()
    x = [float(line.split(',')[0]) for line in lines]
    y = [float(line.split(',')[1]) for line in lines]


int_start = 35
int_end = 75

start = float(x[int_start])
end = float(x[int_end]) 
  
x_data = np.linspace(start, end, (int_end-int_start)+1)
y_data = y[int_start: int_end+1]


def integrand(x, a, b, c):
    return a*np.exp(((-1)*(x-b)**2)/(2*(c**2)))

def cerf(x, a, b, c):
    return integrate.quad(integrand, x, np.inf)

params, params_covariance = optimize.curve_fit(cerf, x_data, y_data)

plt.plot(x_data, y_data, 'x', label='Data')
plt.plot(x_data, integrand(x_data, params[0], params[1], params[2]), '-', label="fit")

plt.legend(loc='best')
plt.show()

More precisely, I want to fit my data to the complementary error function consisting of the integrand function with the parameters a, b, c, and the cerf function doing the actual integration. The integration should go from x (the argument of the function) to +infinity. Afterwards, I wanted to use standard curve_fit from scipy. But now I am getting a value error as follows:

> ValueError                                Traceback (most recent call last)
<ipython-input-33-8130a3eb44bb> in <module>
     29     return integrate.quad(integrand, x, np.inf)
     30 
---> 31 params, params_covariance = optimize.curve_fit(cerf, x_data, y_data)

~\AppData\Roaming\Python\Python37\site-packages\scipy\integrate\quadpack.py in quad(func, a, b, args, full_output, epsabs, epsrel, limit, points, weight, wvar, wopts, maxp1, limlst)
    346 
    347     # check the limits of integration: \int_a^b, expect a < b
--> 348     flip, a, b = b < a, min(a, b), max(a, b)
    349 
    350     if weight is None:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I would be really thankful, if someone knew how to do the fit of the function with the x-arguments as the lower boundary for the integral.

The data look like this:

0.20,0.40
0.21,0.40
0.22,0.40
0.23,0.40
0.24,0.40
0.25,0.40
0.26,0.40
0.27,0.40
0.28,0.40
0.29,0.40
0.30,0.40
0.31,0.40
0.32,0.40
0.33,0.40
0.34,0.40
0.35,0.40
0.36,0.40
0.37,0.40
0.38,0.40
0.39,0.40
0.40,0.40
0.41,0.40
0.42,0.39
0.43,0.39
0.44,0.38
0.45,0.38
0.46,0.37
0.47,0.37
0.48,0.35
0.49,0.34
0.50,0.33
0.51,0.31
0.52,0.30
0.53,0.28
0.54,0.26
0.55,0.24
0.56,0.21
0.57,0.19
0.58,0.16
0.59,0.14
0.60,0.12
0.61,0.10
0.62,0.09
0.63,0.07
0.64,0.06 
0.65,0.05
0.66,0.04
0.67,0.03
0.68,0.02
0.69,0.02
0.70,0.01
0.71,0.01
0.72,0.00
0.73,0.00
0.74,0.00
0.75,0.00
0.76,0.00
0.77,0.00
0.78,-0.00
0.79,0.00
0.80,0.00
0.81,-0.00
0.82,-0.00
0.83,-0.00
0.84,0.00
0.85,-0.00
0.86,0.00
0.87,0.00
0.88,0.00
0.89,-0.00
0.90,0.00

Solution

  • According to the documentation, scipy.integrate.quad does not take arrays, and it cannot call a function with arguments. So, we have to construct a helper function f within the function that is addressed by scipy.curve_fit:

    import matplotlib.pyplot as plt
    import numpy as np
    from scipy import optimize, integrate
    
    #define a function that integrates or evaluates f depending on the Boolean flag func_integr
    def cerf(x, a, b, c, func_integr=True):
        f = lambda x: a*np.exp(((-1)*(x-b)**2)/(2*(c**2)))
        #flag is preset to True, so will return the integrated values
        if func_integr:
            return np.asarray([integrate.quad(f, i, np.inf)[0] for i in x])
        #unless the flag func_integr is set to False, then it will return the function values
        else:
            return f(x)
    
    #read file
    arr=np.genfromtxt("test.txt", delimiter=",")
    x_data = arr[:, 0]
    y_data = arr[:, 1]
    
    #provide reasonable start values...
    start_p = [1, 0, -1]
    #...for scipy.curve_fit
    params, params_covariance = optimize.curve_fit(cerf, x_data, y_data, p0=start_p)
    print(params)
    #[ 2.26757666  0.56501062 -0.0704476 ]
    
    
    #plot our results
    plt.plot(x_data, y_data, 'x', label='Data')
    plt.plot(x_data, cerf(x_data, *params), '-', label="fit")    
    plt.legend(loc='best')
    plt.show()
    

    Sample output:

    enter image description here

    This approach is not the fastest - every x-value is integrated individually. Maybe there are other scipy.integrate functions that can work with numpy arrays; I would not know.
    The part evaluating f instead of its integrated values is not really necessary here. But I used it initially to verify that cerf works as expected, so I left it in the script.