Search code examples
pythoncurve-fittingscipy-optimize

Results From Curve_Fit Are Off


I am trying to reproduce some results from a paper using non-linear parameter estimation, the problem, however, is that when I use curve_fit, all I get back is an array of 1's as opposed to anything close to the results I should be getting.

I have included a minimum working example to illustrate what I am getting and then also the actual results:

import pandas as pd
import numpy as np
from scipy.optimize import curve_fit

xdata = np.array([0.00, 27.01,84.15,134.66,178.74,217.00,250.20,279.06,304.24, 
                  326.29,346.71,362.87,378.13,391.75,403.96,414.96])

ydata = np.array([0.00,440.00,933.00,1154.00,1226.00,1222.00,1185.00,
                 1134.00,1081.00,1031.00,984.00,942.00,904.00,870.00,840.00,814.00])

# Non-Linear Estimation Function
def func(V,A,d):
    return A*V*exp(-1*d*V)

popt, pcov = curve_fit(func,xdata,ydata)

popt
array([1., 1.])

The actual results I should be getting are the following:

param = estimate (standard err)
A = 17.6 (0.132)
d = 5.27 x 10^-3 (2.61 x 10^-5)

Solution

  • Scipy;s curve_fit() routine uses all 1.0 values for initial parameter estimates if none are supplied. If curve_fit() cannot make any improvement on the initial parameter estimates, it will simply return them - which is why you get "fitted" parameter values of all 1.0. Here is a graphical Python fitter with your data and equation, using scipy's differential_evolution genetic algorithm module to supply initial parameter estimates for the non-linear fitter. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. In this example, those bounds are derived from the data max and min values. Note that it is much easier to supply ranges for the parameters than to give specific values.

    plot

    import numpy, scipy, matplotlib
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    from scipy.optimize import differential_evolution
    import warnings
    
    x = [0.00, 27.01,84.15,134.66,178.74,217.00,250.20,279.06,304.24, 
                      326.29,346.71,362.87,378.13,391.75,403.96,414.96]
    
    y = [0.00,440.00,933.00,1154.00,1226.00,1222.00,1185.00,
                     1134.00,1081.00,1031.00,984.00,942.00,904.00,870.00,840.00,814.00]
    
    xData = numpy.array(x, dtype=float)
    yData = numpy.array(y, dtype=float)
    
    
    # Non-Linear Estimation Function
    def func(V,A,d):
        return A*V*numpy.exp(-1.0*d*V)
    
    
    # function for genetic algorithm to minimize (sum of squared error)
    def sumOfSquaredError(parameterTuple):
        warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
        val = func(xData, *parameterTuple)
        return numpy.sum((yData - val) ** 2.0)
    
    
    def generate_Initial_Parameters():
        # min and max used for bounds
        maxX = max(xData)
        minX = min(xData)
        #maxY = max(yData)
        #minY = min(yData)
    
        parameterBounds = []
        parameterBounds.append([minX, maxX/10.0]) # search bounds for A
        parameterBounds.append([minX, maxX/10.0]) # search bounds for d
    
        # "seed" the numpy random number generator for repeatable results
        result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
        return result.x
    
    # by default, differential_evolution completes by calling curve_fit() using parameter bounds
    geneticParameters = generate_Initial_Parameters()
    
    # now call curve_fit without passing bounds from the genetic algorithm,
    # just in case the best fit parameters are aoutside those bounds
    fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
    print('Fitted parameters:', fittedParameters)
    print()
    
    modelPredictions = func(xData, *fittedParameters) 
    
    absError = modelPredictions - yData
    
    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
    
    print()
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)
    
    print()
    
    
    ##########################################################
    # graphics output section
    def ModelAndScatterPlot(graphWidth, graphHeight):
        f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
        axes = f.add_subplot(111)
    
        # first the raw data as a scatter plot
        axes.plot(xData, yData,  'D')
    
        # create data for the fitted equation plot
        xModel = numpy.linspace(min(xData), max(xData))
        yModel = func(xModel, *fittedParameters)
    
        # now the model as a line plot
        axes.plot(xModel, yModel)
    
        axes.set_xlabel('X Data') # X axis data label
        axes.set_ylabel('Y Data') # Y axis data label
    
        plt.show()
        plt.close('all') # clean up after using pyplot
    
    graphWidth = 800
    graphHeight = 600
    ModelAndScatterPlot(graphWidth, graphHeight)