Search code examples
pythonscikit-learnscipyleast-squaresbest-fit-curve

how to set up the initial value for curve_fit to find the best optimizing, not just local optimizing?


I am trying to fit a power-law function, and in order to find the best fit parameter. However, I find that if the initial guess of parameter is different, the "best fit" output is different. Unless I find the right initial guess, I can get the best optimizing, instead of local optimizing. Is there any way to find the **appropriate initial guess ** ????. My code is listed below. Please feel free make any input. Thanks!

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
%matplotlib inline

# power law function
def func_powerlaw(x,a,b,c):
    return a*(x**b)+c

test_X = [1.0,2,3,4,5,6,7,8,9,10]
test_Y =[3.0,1.5,1.2222222222222223,1.125,1.08,1.0555555555555556,1.0408163265306123,1.03125, 1.0246913580246915,1.02]

predict_Y = []
for x in test_X:
    predict_Y.append(2*x**-2+1)

If I align with default initial guess, which p0 = [1,1,1]

popt, pcov = curve_fit(func_powerlaw, test_X[1:], test_Y[1:], maxfev=2000)


plt.figure(figsize=(10, 5))
plt.plot(test_X, func_powerlaw(test_X, *popt),'r',linewidth=4, label='fit: a=%.4f, b=%.4f, c=%.4f' % tuple(popt))
plt.plot(test_X[1:], test_Y[1:], '--bo')
plt.plot(test_X[1:], predict_Y[1:], '-b')
plt.legend()
plt.show()

The fit is like below, which is not the best fit. enter image description here

If I change the initial guess to p0 = [0.5,0.5,0.5]

popt, pcov = curve_fit(func_powerlaw, test_X[1:], test_Y[1:], p0=np.asarray([0.5,0.5,0.5]), maxfev=2000)

I can get the best fit enter image description here

---------------------Updated in 10.7.2018-------------------------------------------------------------------------------------------------------------------------

As I need to run thousands to even millions of Power Law function, using @James Phillips's method is too expensive. So what method is appropriate besides curve_fit? such as sklearn, np.linalg.lstsq etc.


Solution

  • Here is example code using the scipy.optimize.differential_evolution genetic algorithm, with your data and equation. This scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space and so requires bounds within which to search - in this example, those bounds are based on the data maximum and minimum values. For other problems you might need to supply different search bounds if you know what range of parameter values to expect.

    import numpy, scipy, matplotlib
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    from scipy.optimize import differential_evolution
    import warnings
    
    # power law function
    def func_power_law(x,a,b,c):
        return a*(x**b)+c
    
    test_X = [1.0,2,3,4,5,6,7,8,9,10]
    test_Y =[3.0,1.5,1.2222222222222223,1.125,1.08,1.0555555555555556,1.0408163265306123,1.03125, 1.0246913580246915,1.02]
    
    
    # function for genetic algorithm to minimize (sum of squared error)
    def sumOfSquaredError(parameterTuple):
        warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
        val = func_power_law(test_X, *parameterTuple)
        return numpy.sum((test_Y - val) ** 2.0)
    
    
    def generate_Initial_Parameters():
        # min and max used for bounds
        maxX = max(test_X)
        minX = min(test_X)
        maxY = max(test_Y)
        minY = min(test_Y)
        maxXY = max(maxX, maxY)
    
        parameterBounds = []
        parameterBounds.append([-maxXY, maxXY]) # seach bounds for a
        parameterBounds.append([-maxXY, maxXY]) # seach bounds for b
        parameterBounds.append([-maxXY, maxXY]) # seach bounds for c
    
        # "seed" the numpy random number generator for repeatable results
        result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
        return result.x
    
    # generate initial parameter values
    geneticParameters = generate_Initial_Parameters()
    
    # curve fit the test data
    fittedParameters, pcov = curve_fit(func_power_law, test_X, test_Y, geneticParameters)
    
    print('Parameters', fittedParameters)
    
    modelPredictions = func_power_law(test_X, *fittedParameters) 
    
    absError = modelPredictions - test_Y
    
    SE = numpy.square(absError) # squared errors
    MSE = numpy.mean(SE) # mean squared errors
    RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
    Rsquared = 1.0 - (numpy.var(absError) / numpy.var(test_Y))
    print('RMSE:', RMSE)
    print('R-squared:', Rsquared)
    
    print()
    
    
    ##########################################################
    # graphics output section
    def ModelAndScatterPlot(graphWidth, graphHeight):
        f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
        axes = f.add_subplot(111)
    
        # first the raw data as a scatter plot
        axes.plot(test_X, test_Y,  'D')
    
        # create data for the fitted equation plot
        xModel = numpy.linspace(min(test_X), max(test_X))
        yModel = func_power_law(xModel, *fittedParameters)
    
        # now the model as a line plot
        axes.plot(xModel, yModel)
    
        axes.set_xlabel('X Data') # X axis data label
        axes.set_ylabel('Y Data') # Y axis data label
    
        plt.show()
        plt.close('all') # clean up after using pyplot
    
    graphWidth = 800
    graphHeight = 600
    ModelAndScatterPlot(graphWidth, graphHeight)