Search code examples
pythonscipyscipy-optimize

scipy's direct fails (almost) immediately on this toy optimization problem


Consider the following simple MWE:

import numpy as np
from scipy.optimize import  direct
def score(x):
    parity_in_range = len([v for v in x if 4 <= v <= 6])%3
    main_score = np.max(np.abs(np.diff(x)))
    return main_score + parity_in_range
length = 20
bounds = [(0,10)] * length
result = direct(score, locally_biased=False, bounds=bounds, maxiter=10000, maxfun=10000)
print(result)

An optimal solution is to make all the parameters equal and not between 4 and 6. E.g. all 3s. This gives a function value of 0. The optimization works with varying degrees of success with the different optimizers of scipy but it fails almost instantly with direct. It gives:

 message: The volume of the hyperrectangle containing the lowest function value found is below vol_tol=1e-16
 success: True
  status: 4
     fun: 2.0
       x: [ 5.000e+00  5.000e+00 ...  5.000e+00  5.000e+00]
     nit: 2
    nfev: 157

I am not sure that it should report success but the real problem is that it gives up after 157 function evaluations with that warning.

Is there any way to get direct to optimize this function?


Solution

  • The termination parameter vol_tol implicitly depends on the number of dimensions of the problem. The search space is divided up into a series of hyperrectangles, with the smallest middle hyperrectangle having a size of (1/3)^n, where n is the number of dimensions.

    With n=20, this means that the innermost cube will have a volume of 2.8e-10. If that innermost cube's midpoint happens to be the lowest point, then that cube will be subdivided again. Since vol_tol defaults to 1e-16, this means that the algorithm will exit after only two iterations.

    If you don't want vol_tol to cause DIRECT to exit early, you can set vol_tol to zero:

    result = direct(score, locally_biased=False, bounds=bounds, maxiter=10000, maxfun=10000, vol_tol=0)
    

    Running this, it finds a better solution, though still not an optimal one:

     message: Number of function evaluations done is larger than maxfun=10000
     success: False
      status: 1
         fun: 1.1111111111111112
           x: [ 3.889e+00  3.889e+00 ...  5.000e+00  5.000e+00]
         nit: 12
        nfev: 12021
    

    Of course, you could also solve this problem by making the function simpler, e.g. making the parity_in_range objective leaky.

    Altering the objective function to be continuous

    It's frequently easier to optimize an objective function if that function is continuous.

    In the following graph, the blue line represents the existing parity_in_range function for each value, ignoring the mod 3.

    The orange line represents a new function, which slopes down toward the edge, giving the optimizer a hint that there is a lower value in that direction.

    objective function graph

    First, define the primitives that make up this curve. I'm using a sigmoid function as a continuous approximation of the step function.

    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    

    Next, we need to be able to shift the center of this function around, and make the sigmoid function curve up and down faster.

    def sigmoid_at_center(x, center, strength=1):
        return sigmoid((x - center) * strength)
    

    Next, define the parity function as a sigmoid centered at 4, minus a sigmoid centered at 6. The strength parameter is set to 10.

    def get_leaky_parity(x):
        return sigmoid_at_center(x, 4, 10) - sigmoid_at_center(x, 6, 10)
    

    Finally, define the score function in terms of this function.

    def score(x):
        parity_in_range = get_leaky_parity(x).sum()
        main_score = np.max(np.abs(np.diff(x)))
        return main_score + parity_in_range
    

    You can then use the following code to use DIRECT to optimize this. I found that local bias made it able to solve it much faster.

    result = direct(score, locally_biased=True, bounds=bounds, vol_tol=0, len_tol=0.001)
    

    With this change to the objective function, it's able to solve this problem in up to 96 dimensions.

    Sources used: Lipschitzian Optimization Without the Lipschitz Constant