I need to optimize a scalar function in a high-dimensional space. The function varies quickly with changing arguments such that the (absolute value of) gradients are large. The optimizers in scipy.optimize.minimize
fail because the minimization procedure takes steps that are too large. The following code illustrates the problem using a simple quadratic function.
from scipy.optimize import minimize
def objective(x, scalar=1):
"""
Quadratic objective function with optional scalar.
"""
# Report function call for debugging
print "objective({}, scalar={})".format(x, scalar)
# Return function value and gradient
return x ** 2 * scalar, 2 * x * scalar
# This optimisation succeeds
print minimize(objective, 1, jac=True)
# This optimisation fails
print minimize(objective, 1, (1e8, ), jac=True)
Of course, I can manually rescale the value and gradient of the function of interest, but I was wondering whether there is a recommended approach for minimizing such functions, e.g. specifying a learning rate.
For large non-linear optimization problems typically one would pay attention to (at least) four things:
Some more advanced solvers may provide some support for automatic scaling. However scaling for non-linear problems is not that easy as the Jacobian will change (some strategies that are typically available are: scale only the linear part, scale linear + nonlinear part once at the beginning based on initial values, or rescale the problem during the iteration process). Linear solvers have an easier job in this respect (Jacobian is constant so we can scale once at the beginning). Scipy.optimize.minimize
is not the most advanced so I would encourage you to scale things yourself (typically you can only do this once before the solver starts; in some cases you may even stop the solver to rescale and then call the solver again using the last point as initial value -- this sounds crazy but this trick helped me a few times). A good initial point and good bounds can also help in this respect (in order to keep the solver in reasonable regions where functions and gradients can be reliably evaluated). Finally sometimes model reformulations can help in providing better scaling (replace division by multiplication, taking logs etc).