Search code examples
pythonscipyscipy-optimize

Why is scipy.optimize.curve_fit evaluating the initial guess repeatedly (and probably costly)?


When using curve_fit the model function is repeatedly and uselessly (and probably costly) evaluated without changing the parameters. Why does this happen?

Consider the following example for identifying the parameters of a quadratic function:

from scipy.optimize import curve_fit

i=0

def f(x, a, b):
    global i; i += 1
    print('run: {:2}, p: {:<11.10}, {:<11.10}'.format(i, a, b))
    return(x**a+b)

popt, pcov = curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5])
print('\npopt:', popt)

Output:

run:  1, p: 1.5        , 0.5        
run:  2, p: 1.5        , 0.5        
run:  3, p: 1.5        , 0.5        
run:  4, p: 1.500000022, 0.5        
run:  5, p: 1.5        , 0.5000000075
run:  6, p: 2.166073038, 0.9668143807
run:  7, p: 2.166073071, 0.9668143807
run:  8, p: 2.166073038, 0.9668143951
run:  9, p: 2.014374939, 0.9956632744
run: 10, p: 2.014374969, 0.9956632744
run: 11, p: 2.014374939, 0.9956632892
run: 12, p: 2.000113621, 0.9999686478
run: 13, p: 2.000113651, 0.9999686478
run: 14, p: 2.000113621, 0.9999686627
run: 15, p: 2.000000007, 0.999999998
run: 16, p: 2.000000037, 0.999999998
run: 17, p: 2.000000007, 1.000000013
run: 18, p: 2.0        , 1.0        

popt: [2. 1.]

The first evaluation calculates the values for the initial guess, but this is then done again in the second and third run. Only in the fourth and fifth evaluation the optimization is started by calculating the derivatives. If the function evaluations are costly and some larger-scale tolerance is acceptable, these redundant evaluations can take up a considerable amount of time.


Solution

  • WIth full_output (see leastsq docs):

    In [125]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],full_output=True)            
    run:  1, p: 1.5        , 0.5        
    run:  2, p: 1.5        , 0.5        
    run:  3, p: 1.5        , 0.5        
    run:  4, p: 1.500000022, 0.5        
    run:  5, p: 1.5        , 0.5000000075
    run:  6, p: 2.166073038, 0.9668143807
    run:  7, p: 2.166073071, 0.9668143807
    run:  8, p: 2.166073038, 0.9668143951
    run:  9, p: 2.014374939, 0.9956632744
    run: 10, p: 2.014374969, 0.9956632744
    run: 11, p: 2.014374939, 0.9956632892
    run: 12, p: 2.000113621, 0.9999686478
    run: 13, p: 2.000113651, 0.9999686478
    run: 14, p: 2.000113621, 0.9999686627
    run: 15, p: 2.000000007, 0.999999998
    run: 16, p: 2.000000037, 0.999999998
    run: 17, p: 2.000000007, 1.000000013
    run: 18, p: 2.0        , 1.0        
    Out[125]: 
    (array([2., 1.]), array([[ 0., -0.],
            [-0.,  0.]]), {'fvec': array([0., 0., 0., 0.]),
      'nfev': 16,
      'fjac': array([[-10.2688908 ,   0.        ,   0.26999885,   0.96286064],
             [ -1.2328595 ,  -1.5748198 ,   0.2521752 ,  -0.73019944]]),
      'ipvt': array([1, 2], dtype=int32),
      'qtf': array([-7.08708997e-08,  3.07498129e-09])}, 'The relative error between two consecutive iterates is at most 0.000000', 2)
    

    Where the returned information is:

    infodict : dict
    a dictionary of optional outputs with the keys:
    
    ``nfev``
        The number of function calls
    ``fvec``
        The function evaluated at the output
    ``fjac``
        A permutation of the R matrix of a QR
        factorization of the final approximate
        Jacobian matrix, stored column wise.
        Together with ipvt, the covariance of the
        estimate can be approximated.
    ``ipvt``
        An integer array of length N which defines
        a permutation matrix, p, such that
        fjac*p = q*r, where r is upper triangular
        with diagonal elements of nonincreasing
        magnitude. Column j of p is column ipvt(j)
        of the identity matrix.
    ``qtf``
        The vector (transpose(q) * fvec).
    

    So it's claiming to evaluate 16 times, not your 18. So one or more of the initial evaluations might be from curve_fit parameter checking (or something like that).

    With other methods:

    In [134]: i=0                                                                                    
    In [135]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='trf')                
    run:  1, p: 1.5        , 0.5        
    run:  2, p: 1.500000022, 0.5        
    run:  3, p: 1.5        , 0.5000000149
    run:  4, p: 2.166073038, 0.9668143807
    run:  5, p: 2.166073071, 0.9668143807
    run:  6, p: 2.166073038, 0.9668143956
    run:  7, p: 2.01437494 , 0.9956632758
    run:  8, p: 2.01437497 , 0.9956632758
    run:  9, p: 2.01437494 , 0.9956632907
    run: 10, p: 2.000113621, 0.9999686478
    run: 11, p: 2.000113651, 0.9999686478
    run: 12, p: 2.000113621, 0.9999686627
    run: 13, p: 2.000000007, 0.999999998
    run: 14, p: 2.000000037, 0.999999998
    run: 15, p: 2.000000007, 1.000000013
    run: 16, p: 2.0        , 1.0        
    run: 17, p: 2.00000003 , 1.0        
    run: 18, p: 2.0        , 1.000000015
    Out[135]: 
    (array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
            [-1.19338002e-33,  9.94005329e-33]]))
    

    and

    In [136]: i=0                                                                                    
    In [137]: curve_fit(f, [0, 1, 2, 3], [1, 2, 5, 10], p0 = [1.5, 0.5],method='dogbox')             
    run:  1, p: 1.5        , 0.5        
    run:  2, p: 1.500000022, 0.5        
    run:  3, p: 1.5        , 0.5000000149
    run:  4, p: 2.166073038, 0.9668143807
    run:  5, p: 2.166073071, 0.9668143807
    run:  6, p: 2.166073038, 0.9668143956
    run:  7, p: 2.01437494 , 0.9956632758
    run:  8, p: 2.01437497 , 0.9956632758
    run:  9, p: 2.01437494 , 0.9956632907
    run: 10, p: 2.000113621, 0.9999686478
    run: 11, p: 2.000113651, 0.9999686478
    run: 12, p: 2.000113621, 0.9999686627
    run: 13, p: 2.000000007, 0.999999998
    run: 14, p: 2.000000037, 0.999999998
    run: 15, p: 2.000000007, 1.000000013
    run: 16, p: 2.0        , 1.0        
    run: 17, p: 2.00000003 , 1.0        
    run: 18, p: 2.0        , 1.000000015
    Out[137]: 
    (array([2., 1.]), array([[ 3.77052335e-34, -1.19338002e-33],
            [-1.19338002e-33,  9.94005329e-33]]))