Search code examples
pythoncurve-fittingnumerical-methodsdata-fittingmodel-fitting

Fitting parameter inside an integral using python (or another useful language)


I have a set of data, basically with the information of f(x) as a function of x, and x itself. I know from the theory of the problem that I'm working on the format of f(x), which is given as the expression below:

Eq. 1

Essentially, I want to use this set of data to find the parameters a and b. My problem is: How can I do that? What library should I use? I would like an answer using Python. But R or Julia would be ok as well.

From everything I had done so far, I've read about a functionallity called curve fit from the SciPy library but I'm having some trouble in which form I would do the code as long my x variable is located in one of the integration limit.

For better ways of working with the problem, I also have the following resources:

A sample set, for which I know the parameters I'm looking for. To this set I know that a = 2 and b = 1 (and c = 3). And before it rises some questions about how I know these parameters: I know they because I created this sample set using this parameters from the integration of the equation above just to use the sample to investigate how can I find them and have a reference.

I also have this set, for which the only information I have is that c = 4 and want to find a and b.

I would also like to point out that:

i) right now I have no code to post here because I don't have a clue how to write something to solve my problem. But I would be happy to edit and update the question after reading any answer or help that you guys could provide me.

ii) I'm looking first for a solution where I don't know a and b. But in case that it is too hard I would be happy to see some solution where I suppose that one either a or b is known.

EDIT 1: I would like to reference this question to anyone interested in this problem as it's a parallel but also important discussion to the problem faced here


Solution

  • I would use a pure numeric approach, which you can use even when you can not directly solve the integral. Here's a snipper for fitting only the a parameter:

    import numpy as np
    from scipy.optimize import curve_fit
    import pandas as pd
    import matplotlib.pyplot as plt
    
    def integrand(x, a):
        b = 1
        c = 3
        return 1/(a*np.sqrt(b*(1+x)**3 + c*(1+x)**4))
    
    def integral(x, a):
        dx = 0.001
        xx = np.arange(0, x, dx)
        arr = integrand(xx, a)
        return np.trapz(arr, dx=dx, axis=-1)
    
    vec_integral = np.vectorize(integral)
    
    df = pd.read_csv('data-with-known-coef-a2-b1-c3.csv')
    x = df.domin.values
    y = df.resultados2.values
    out_mean, out_var = curve_fit(vec_integral, x, y, p0=[2])
    
    plt.plot(x, y)
    plt.plot(x, vec_integral(x, out_mean[0]))
    plt.title(f'a = {out_mean[0]:.3f} +- {np.sqrt(out_var[0][0]):.3f}')
    plt.show()
    
    vec_integral = np.vectorize(integral)
    

    enter image description here

    Of course, you can lower the value of dx to get the desired precision. While for fitting just the a, when you try to fir b as well, the fit does not converge properly (in my opinion because a and b are strongly correlated). Here's what you get:

    def integrand(x, a, b):
        c = 3
        return 1/(a*np.sqrt(np.abs(b*(1+x)**3 + c*(1+x)**4)))
    
    def integral(x, a, b):
        dx = 0.001
        xx = np.arange(0, x, dx)
        arr = integrand(xx, a, b)
        return np.trapz(arr, dx=dx, axis=-1)
    
    vec_integral = np.vectorize(integral)
    
    out_mean, out_var = sp.optimize.curve_fit(vec_integral, x, y, p0=[2,3])
    plt.title(f'a = {out_mean[0]:.3f} +- {np.sqrt(out_var[0][0]):.3f}\nb = {out_mean[1]:.3f} +- {np.sqrt(out_var[1][1]):.3f}')
    
    plt.plot(x, y, alpha=0.4)
    plt.plot(x, vec_integral(x, out_mean[0], out_mean[1]), color='green', label='fitted solution')
    plt.plot(x, vec_integral(x, 2, 1),'--', color='red', label='theoretical solution')
    plt.legend()
    plt.show()
    

    enter image description here

    As you can see, even if the resulting a and b parameters form the fit are "not good", the plot is very similar.