Search code examples
pythonscipyminimization

Python vectorized minimization of a multivariate loss function without jacobian


I have a loss function that needs to be minimized

def loss(x: np.ndarray[float]) -> float

My problem has nDim=10 dimensions. Loss function works for 1D arrays of shape (nDim,), and with 2D arrays of shape (nSample, nDim) for an arbitrary number of samples. Because of the nature of the implementation of the loss function (numpy), it is significantly faster to make a single call to the loss function with several samples packed into 2D argument than to make several 1D calls.

The minimizer I am currently running is

sol = scipy.optimize.basinhopping(loss, x0, minimizer_kwargs={"method": "SLSQP"})

It does the job, but is too slow. As of current, the minimizer is making single 1D calls to the loss function. Based on observing the sample points, it seems, SLSQP is performing numerical differentiation, thus sampling 11 points for each 1 sample to calculate the gradient. Theoretically, it should be possible to implement this minimizer with vectorized function calls, requesting all 11 sample points from the loss function simultaneously.

I was hoping that there would be a vectorize flag for SLSQP, but it does not seem to be the case, please correct me if I am wrong.

Note also that the loss function is far too complicated for analytic calculation of derivatives, so explicit Jacobian not an option.

Question: Does Scipy or any other minimization library for python support a global optimization strategies (such as basinhopping) with vectorized loss function and no available Jacobian?


Solution

  • differential_evolution is a global optimizer that does not require gradients. It has a vectorized keyword to enable many function evaluations in a single call.

    Alternatively, you could write a function that takes the Jacobian with scipy.differentiate.jacobian, which calls the function at all required points at once, and pass that as the Jacobian callable. However, it is designed for accuracy, not speed, so you should probably set coarse tolerances and low order.