I have a negative log-likelihood function to minimize. I want to set the array of observations as a parameter of the function to optimize and not directly into the function, but strangely, the optimizer explodes. I am interested to discover why this is, and eventually to understand what needs to be changed to have a converging optimizer.
I set the observations as parameters of the function this way: mn stands for scipy.optimize.minimize
def f(x, d ):
alfa = x[0]
lambda_ = x[1]
return - n * np.log(alfa) * lambda_ + alfa * sum(d)
n = 2000 #number of observations
y = np.random.exponential(2 , n) #vector of observations
res = mn(f, x0 = [2,1/2], args = y)
and the results are:
fun: nan
hess_inv: array([[0.67448386, 0.61331579],
[0.61331579, 0.55866767]])
jac: array([nan, nan])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 452
nit: 2
njev: 113
status: 2
success: False
x: array([-2947.66055677, -2680.19131049])
whereas if I set the observations internally and not as a parameter
def f(x):
alfa = x[0]
lambda_ = x[1]
n = 2000
y = np.random.exponential(2 , n)
return - n * np.log(alfa) * lambda_ + alfa * sum(y)
mn(f, x0 = [2,2])
I get some quite good estimations
fun: 5072.745186459168
hess_inv: array([[ 3.18053796e-16, -1.07489375e-15],
[-1.07489371e-15, 3.63271745e-15]])
jac: array([1.65160556e+10, 1.11412293e+10])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 122
nit: 3
njev: 28
status: 2
success: False
x: array([1.99998635, 1.99999107])
Even if the optimizer doesn't consider it as a success.
Part 1
First, let me answer the explicit question:
Could someone please explain me why, and eventually tell me what to change to have a converging optimizer ?
The function - n * np.log(alfa) * lambda_ + alfa * sum(d)
can't be minimized. For a given finite lambda, the function has a unique minimum in alpha because the terms -log(alfa)
and +alfa
grow at different rates. Now let's look at lambda in isolation. The function is linear in lambda and therefore has a minimum only at lambda_ = +inf
.
The optimizer is trying to increase lambda towards infinity, but increasing lambda also increases the alfa-minimum and things diverge quickly...
At least in theory. In practice, sooner or later the optimizer tries a negative alpha, which causes the logarithm to become invalid. For some reason this makes the optimizer think that it is a good idea to decrease alpha even further.
Two steps to achieve a converging optimizer:
bounds=[(0, np.inf), (-np.inf, np.inf)])
Part 2
What's happening in the "not as parameter" scenario is an entirely different story. With
y = np.random.exponential(2 , n)
the cost function generates new random values each time it is called. This makes the cost function stochastic but the optimizer expects a deterministic function.
Each time the optimizer calls the function it gets a partially random result. This confuses the optimizer and after three iterations (nit: 3
) it gives up. In three steps, the estimate has only slightly deviated from the initial values.