Search code examples
pythonoptimizationscipylog-likelihood

Scipy optimizer explodes/diverges at changing way of specifying parameters


I have a negative log-likelihood function to minimize. I want to set the array of observations as a parameter of the function to optimize and not directly into the function, but strangely, the optimizer explodes. I am interested to discover why this is, and eventually to understand what needs to be changed to have a converging optimizer.

I set the observations as parameters of the function this way: mn stands for scipy.optimize.minimize

def f(x, d ):
    alfa = x[0]
    lambda_ = x[1] 
    return - n * np.log(alfa) * lambda_ + alfa * sum(d) 

n = 2000 #number of observations 
y = np.random.exponential(2 , n) #vector of observations 

res = mn(f, x0 = [2,1/2], args = y)

and the results are:

    fun: nan
 hess_inv: array([[0.67448386, 0.61331579],
       [0.61331579, 0.55866767]])
      jac: array([nan, nan])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 452
      nit: 2
     njev: 113
   status: 2
  success: False
        x: array([-2947.66055677, -2680.19131049])

whereas if I set the observations internally and not as a parameter

def f(x):
   alfa = x[0]
   lambda_ = x[1]
   n = 2000
   y = np.random.exponential(2 , n)
   return - n * np.log(alfa) * lambda_ + alfa * sum(y)

mn(f, x0 = [2,2])

I get some quite good estimations

    fun: 5072.745186459168
 hess_inv: array([[ 3.18053796e-16, -1.07489375e-15],
       [-1.07489371e-15,  3.63271745e-15]])
      jac: array([1.65160556e+10, 1.11412293e+10])
  message: 'Desired error not necessarily achieved due to precision loss.'
     nfev: 122
      nit: 3
     njev: 28
   status: 2
  success: False
        x: array([1.99998635, 1.99999107])

Even if the optimizer doesn't consider it as a success.


Solution

  • Part 1

    First, let me answer the explicit question:

    Could someone please explain me why, and eventually tell me what to change to have a converging optimizer ?

    The function - n * np.log(alfa) * lambda_ + alfa * sum(d) can't be minimized. For a given finite lambda, the function has a unique minimum in alpha because the terms -log(alfa) and +alfa grow at different rates. Now let's look at lambda in isolation. The function is linear in lambda and therefore has a minimum only at lambda_ = +inf.

    The optimizer is trying to increase lambda towards infinity, but increasing lambda also increases the alfa-minimum and things diverge quickly...

    At least in theory. In practice, sooner or later the optimizer tries a negative alpha, which causes the logarithm to become invalid. For some reason this makes the optimizer think that it is a good idea to decrease alpha even further.

    Two steps to achieve a converging optimizer:

    1. Use a cost function that can be minimized (e.g. make lambda fixed, or put a penalty on large lambdas)
    2. Tell the optimizer alpha must not be negative with bounds=[(0, np.inf), (-np.inf, np.inf)])

    Part 2

    What's happening in the "not as parameter" scenario is an entirely different story. With y = np.random.exponential(2 , n) the cost function generates new random values each time it is called. This makes the cost function stochastic but the optimizer expects a deterministic function.

    Each time the optimizer calls the function it gets a partially random result. This confuses the optimizer and after three iterations (nit: 3) it gives up. In three steps, the estimate has only slightly deviated from the initial values.