Search code examples
pythonscipycurve-fittingchemistry

Divide by zero error in scipy.optimize.curvefit


I am trying to fit three time-evolution curves with two rate constants, k1 and k2. The system of equations I am trying to solve is:

A_t = A_0 * exp(-k1*t)
B_t = [A_0 * k1/(k2-k1)]* exp(-k1*t) - [A_0*(k1/(k2-k1)-B_0] * exp(-k2*t)
C_t = [A_0 * -k2/(k2-k1) ]* exp(-k1*t) + [A_0*(k1/(k2-k1)-B_0] * exp(-k2*t) + A_0 + B_0

where I want fit the best values of k1 and k2 to my data values of A,B and C, where A_t etc is the current population of A at time t, A_0=0.4 and B_0=0.6.

To solve this I am using the scipy.optimize.curve_fit function where I set up the equations as matrices u and w. In the following, I have manually entered the A_0=0.4 and B_0=0.6 into the function (which relates to part 2 of my question at the bottom):

def func(t,k1,k2):

    u = np.array([[0.4,0,0],
                  [0.4*k1/(k2-k1),-0.4*(k1/(k2-k1))+0.6,0],
                  [0.4*(-k2/(k2-k1)),0.4*k1/(k2-k1)-0.6,1]])


    w = np.array([np.exp(-t*k1),
                  np.exp(-t*k2),
                  np.ones_like(t)])

    return np.dot(u,w).flatten()

To solve for some test data, I create a test set where I set k1=0.03 and k2=0.003:

t=np.arange(1000)*0.5
test=func(t,0.03,0.004).reshape((3,1000))
test+=np.random.normal(size=test.shape)*0.01

which produces the following plot:

Plot of test

When I then try to fit values of k1 and k2, I get the following error:

popt,popc=optimize.curve_fit(func,t,test.flatten(),method='lm')

/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:4: RuntimeWarning: divide by zero encountered in double_scalars after removing the cwd from sys.path. /usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encountered in double_scalars """ /usr/local/lib/python3.6/site-packages/scipy/optimize/minpack.py:785: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)

I understand that there is a divide by zero error somewhere here, but I am not sure where it is or how to solve it. So, my questions are:

  1. How to solve for this error in the curve_fit function?
  2. Is there a way to pass A_0 and B_0 into optimize.curve_fit, rather than manually entering the concentrations as I have done above? My understanding is that only the independent variable t and the unknowns k1 and k2 can be passed to the function?

Thank you for any help that can be provided


Solution

  • Per the comments, here is an example of using scipy's differential_evolution genetic algorithm module for estimating initial parameters. This module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and that algorithm requires bounds within which to search. In this example, those bounds are taken from the data minimum and maximum values. The fitting completes with a call to curve_fit() without passing bounds from the genetic algorithm search, in case the best parameters are outside those bounds.

    import numpy, scipy, matplotlib
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    from scipy.optimize import differential_evolution
    import warnings
    
    xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
    yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])
    
    
    def func(t, n_0, L, offset): #exponential curve fitting function
        return n_0*numpy.exp(-L*t) + offset
    
    
    # function for genetic algorithm to minimize (sum of squared error)
    def sumOfSquaredError(parameterTuple):
        warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
        val = func(xData, *parameterTuple)
        return numpy.sum((yData - val) ** 2.0)
    
    
    def generate_Initial_Parameters():
        # min and max used for bounds
        maxX = max(xData)
        minX = min(xData)
        maxY = max(yData)
        minY = min(yData)
    
        parameterBounds = []
        parameterBounds.append([minX, maxX]) # seach bounds for n_0
        parameterBounds.append([minX, maxX]) # seach bounds for L
        parameterBounds.append([0.0, maxY]) # seach bounds for Offset
    
        # "seed" the numpy random number generator for repeatable results
        result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
        return result.x
    
    # by default, differential_evolution completes by calling curve_fit() using parameter bounds
    geneticParameters = generate_Initial_Parameters()
    
    # now call curve_fit without passing bounds from the genetic algorithm,
    # just in case the best fit parameters are aoutside those bounds
    fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
    print('Fitted parameters:', fittedParameters)
    print()
    
    modelPredictions = func(xData, *fittedParameters) 
    
    absError = modelPredictions - yData
    
    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
    
    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)
    
    print()
    
    
    ##########################################################
    # graphics output section
    def ModelAndScatterPlot(graphWidth, graphHeight):
        f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
        axes = f.add_subplot(111)
    
        # first the raw data as a scatter plot
        axes.plot(xData, yData,  'D')
    
        # create data for the fitted equation plot
        xModel = numpy.linspace(min(xData), max(xData))
        yModel = func(xModel, *fittedParameters)
    
        # now the model as a line plot
        axes.plot(xModel, yModel)
    
        axes.set_xlabel('X Data') # X axis data label
        axes.set_ylabel('Y Data') # Y axis data label
    
        plt.show()
        plt.close('all') # clean up after using pyplot
    
    graphWidth = 800
    graphHeight = 600
    ModelAndScatterPlot(graphWidth, graphHeight)