Search code examples
pythonscipyscipy-optimize

Desired error not necessarily achieved due to precision loss. -novice


I am trying to do mle for school assignment, why could I be getting this error:

#1) Find best fitting distribution parameters
best_fit(macamil2data) 
#     {'loglaplace': {'c': 1.0603278885766216,
#                     'loc': -0.04671203840594998,
#                     'scale': 10.230045114772532}}

#2) Calculate pdf using said parameters
def loglaplace_loglikelihood(params, data):
    c, loc, scale = params
    return stats.loglaplace.logpdf(data, c=c, loc=loc, scale=scale).sum()

#3) Minimize using said parameters
initial_params = [1.0603278885766216, -0.04671203840594998, 10.230045114772532]
results = minimize(loglaplace_loglikelihood, initial_params, args=(macamil2data))

print(results)

 message: Desired error not necessarily achieved due to precision loss.
  success: False
   status: 2
      fun: nan
        x: [ 5.127e+03 -1.765e+05 -1.945e+03]
      nit: 2
      jac: [       nan        nan        nan]
 hess_inv: [[ 6.876e-01 -1.388e+01  5.195e-02]
            [-1.388e+01  5.437e+02  5.473e+00]
            [ 5.195e-02  5.473e+00  1.000e+00]]
     nfev: 464
     njev: 116

Tried lowering gtol, but it appears to me that I am making a systematic error my first time working with data in py. Sorry if it;s silly.

macamil2data=
0.916666
1
3.3
3.38333
3.68333
4.16667
4.2
6.08333
6.61667
7.03333
7.5
7.85
8.15
9.08333
9.35
10.0833
10.1833
10.4333
11.2833
14.2
16.5333
20.0333
23.8333
30.35
30.5167
32.4667
37.1
40.8167
45.6
52
70.0667
70.5333
85.2333
130.967

Solution

  • You have two problems:

    • A missing minus sign in your likelihood function as you want to minimize it;
    • Probably not enough data to ensure proper convergence for this long tailed distribution.

    For the parameters you provided, it seems at least a thousand points are needed.

    import numpy as np
    from scipy import stats, optimize
    
    p = [1.0603278885766216, -0.04671203840594998, 10.230045114772532]
    X = stats.loglaplace(c=p[0], loc=p[1], scale=p[2])
    
    np.random.seed(123456)
    data = X.rvs(3000)
    
    def likelihood(params, data):
        c, loc, scale = params
        # Mind the minus sign for minimization
        return - stats.loglaplace.logpdf(data, c=c, loc=loc, scale=scale).sum()
    
    p0 = [1., -1., 10.]
    results = optimize.minimize(likelihood, p0, args=(data,))
    #       fun: 11834.328446172953
    #  hess_inv: array([[ 3.98302416e-04, -4.38626807e-06, -5.10104269e-04],
    #        [-4.38626807e-06,  8.95197993e-06,  2.10634948e-05],
    #        [-5.10104269e-04,  2.10634948e-05,  2.66869796e-02]])
    #       jac: array([0., 0., 0.])
    #  message: 'Optimization terminated successfully.'
    #     nfev: 116
    #      nit: 15
    #     njev: 27
    #   status: 0
    #  success: True
    #        x: array([ 1.07636068, -0.04651799, 10.13396896])
    

    With your data it gives:

    original_data = np.array([0.916666, 1, 3.3, 3.38333, 3.68333, 4.16667, 4.2, 6.08333, 6.61667, 7.03333, 7.5, 7.85, 8.15, 9.08333, 9.35, 10.0833, 10.1833, 10.4333, 11.2833, 14.2, 16.5333, 20.0333, 23.8333, 30.35, 30.5167, 32.4667, 37.1, 40.8167, 45.6, 52, 70.0667, 70.5333, 85.2333, 130.967])
    
     #      fun: 143.3183272859989
     # hess_inv: array([[ 0.06063633, -0.27002167, -0.01566576],
     #       [-0.27002167,  2.81922376,  0.21352969],
     #       [-0.01566576,  0.21352969,  0.77303566]])
     #     jac: array([ 0.73000717, -1.26998901, -0.43971252])
     #  message: 'Desired error not necessarily achieved due to precision loss.'
     #     nfev: 52
     #      nit: 2
     #     njev: 11
     #   status: 2
     #  success: False
     #        x: array([ 1.12952744, -0.4226847 , 10.2751086 ])
    

    Where we can confirm the gradient has not been cancelled hence the warning message.