python-3.x data-fitting function-fitting

RuntimeWarning: invalid value non-linear-fit with multiple parameters and independent variable

I just lerned python a couple of weeks ago and I´m currently having a problem with fitting data to a given function. I tried differnt methods to fit my data but I keep getting the error RuntimeWarning: invalid value, or sth similiar (like divided by zero). I defined the function I want to fit in the following python function:

def pval(x, A_0, n_0, E_0, a_troe, T_3, T_1, T_2,):#depending on the method 'p' insted of the parameters i.e: pval(x,p)

#A_0, n_0, E_0, a_troe, T_3, T_1, T_2 = p[0:7]

A_u = 14099393133.869781

n_u = 0.043936990336299386

E_u = 16281.619590689397

k_u = A_u * (x[:,0]**n_u) * numpy.exp(-E_u/x[:,0])

x_troe = ((A_0 * (x[:,0]**n_0) * numpy.exp(-E_0 / x[:, 0])) / k_u) * (x[:, 1] * 1.01325 * 10**(-1)) / (8.3141 * x[:, 0])

Fc = (1-a_troe) * numpy.exp(-x[:,0]/T_3) + a_troe * numpy.exp(-x[:,0]/T_1) + numpy.exp(-T_2/x[:,0])

O = numpy.log10(x_troe) - 0.4 - 0.67 * numpy.log10(Fc)

U = 0.75 - 1.27 * numpy.log10(Fc) - 0.14 * (numpy.log10(x_troe) - 0.4 - 0.67 * numpy.log10(Fc))

log_F = numpy.log10(Fc)/(1 + (O/U)**2)

f = numpy.log10(k_u) - numpy.log10(1+1/x_troe) + log_F

return f

As you can see I want to fit 7 parameter and the function has 2 independent variables. An excerpt of my data, with which none of the methods I tried works, is given here:

x= [[1.0e+03 1.0e-02]
   [1.0e+03 1.0e-01]
 [1.0e+03 1.0e+00]
 [1.0e+03 1.0e+01]
 [1.0e+03 1.0e+02]
 [1.1e+03 1.0e-02]
 [1.1e+03 1.0e-01]
 [1.1e+03 1.0e+00]
 [1.1e+03 1.0e+01]
 [1.1e+03 1.0e+02]
 [1.2e+03 1.0e-01]
 [1.2e+03 1.0e+00]
 [1.2e+03 1.0e+01]
 [1.2e+03 1.0e+02]
 [1.3e+03 1.0e+00]
 [1.3e+03 1.0e+01]
 [1.3e+03 1.0e+02]
 [1.4e+03 1.0e+00]
 [1.4e+03 1.0e+01]
 [1.4e+03 1.0e+02]
 [1.5e+03 1.0e+01]
 [1.5e+03 1.0e+02]
 [1.6e+03 1.0e+01]
 [1.6e+03 1.0e+02]
 [1.7e+03 1.0e+02]
 [1.8e+03 1.0e+02]
 [1.9e+03 1.0e+02]
 [2.0e+03 1.0e+02]]
y = [2.89894501 2.99443594 3.11048533 3.18421145 3.20302744 3.15204054
 3.37720969 3.62462651 3.78744276 3.83868541 3.64709041 4.00417085
 4.26080811 4.36152197 4.28960902 4.63156552 4.79327409 4.50830342
 4.9238101  5.15010835 5.1568065  5.44522578 5.34414656 5.68986891
 5.89350044 6.06378356 6.20646696 6.32558954]

The first solution I tried was the leastsq() funtion:

import numpy
from scipy import *
from scipy.optimize import leastsq

def residuals(p, y, x): # in this case pval defined as pval(x,p)
    err = y - pval(x, p)
    return err
startpars = numpy.array([1.88e+13, -1.03, 11980, 0.76, 1.0e+10, 1.74, 9.33e+09], dtype=numpy.float64)
plsq = leastsq(residuals, startpars, args=(y, x), maxfev=2000)
print(plsq)

output:

RuntimeWarning: invalid value encountered in log10

I tried several startparameters next method: curve_fit

 import numpy
 from scipy import *
 from scipy.optimize import curve_fit

 print(curve_fit(pval, x, y,))

which returns the same error message as the prevous method. I tried to define startparameter and set bounds but that didn't work next method:lmfit

 import numpy
 from lmfit import Model
 startpars = numpy.array([1.88e+13, -1.03, 11980, 0.76, 1.0e+10, 1.74, 9.33e+09], dtype=numpy.float64)
 lmfit_model = Model(pval)
 lmfit_result = lmfit_model.fit(y, x=x, A_0 = startpars[0], n_0= startpars[1], E_0= startpars[2], a_troe= startpars[3], T_3= startpars[4], T_1= startpars[5], T_2= startpars[6])
 lmfit_Rsquared = 1 - lmfit_result.residual.var() / numpy.var(y)
 print('Fit R-squared:', lmfit_Rsquared, '\n')
 print(lmfit_result.fit_report())

output:

ValueError: The model function generated NaN values and the fit aborted! Please check your model function and/or set boundaries on parameters where applicable. In cases like this, using "nan_policy='omit'" will probably not work.

I also tried another solution which I found online. There the Levenberg–Marquardt algorithm was used, but its quiet a long code and I think my question is already long enough. But I can post this solution as well if needed.

I printed out some values while the optimzation was running and some values indeed become inf or -inf, but as written before setting bounds diddn´t work.

I hope anybody of you has an idea. Thank in advance. I hoe my question is formulated clearly in case you need more info i gladly provide that

Solution

The messages are telling you that your model function is generating NaN (not a number), which make the fit impossible to solve. For your model function, this is almost certainly coming either from np.log() or from np.exp().

np.log(x) is NaN for x<=0.

Pretty much every place you use np.log() (or np.log10()), you should check or ensure that the argument cannot be negative. Since the fit variables can have essentially any combination of values (well, you might add bounds if, for example, some should never be negative), you may need to check this with the model function.

Slightly less worse, but something to watch for is that np.exp() can overflow to np.Inf if the argument is >~700. Having Infs will also cause the fit to stop but are easier to guard against.

A final comment: numerical methods tend to not do a very good job when the scales of values differs by many (say, 10) orders of magnitude. The Levenberg-Marquardt method used in lmfit and scipy.optimize.leastsq tries to handle this for you, but if your variables have a known scale, it is often a good idea to have the variables be close to the same order of magnitude (say 1.e-5 to 1e5) and apply scaling factors in your model function.