Finding which inputs yield the highest value from a score function

I have a function that returns a score that is calculated based on inputs. Below is a very simple example.

def score(i, j):
    if i <= 2 and j <= 3:
        return i * j
    return 0

We can see that the ideal inputs for the function are i=2 and j=3, giving the highest score possible of 6. I would like a method so that I can search for these ideal input values. The simplest would be

max_score = 0
ideal_i = ideal_j = None
for i in range(5):
    for j in range(5):
        s = score(i, j)
        if s > max_score:
            ideal_i, ideal_j = i, j
            max_score = s
print(ideal_i, ideal_j)  # prints 2 3

But what if the ideal values are floats? What if they are outside the bounds of 0 and 4? What if I have 3 inputs I want to find?

This feels like a machine learning & calculus problem, but I am unfamiliar with an implementation that would work. I imagine a 2D graph with i and j on the x and y axis, and some heat map showing the value of score(i, j) for each coordinate. The algorithm would search around for a local maximum in this graph (similar to machine learning searching for a local minimum loss value). Any insight into what this would be like, or what to search for would be appreciated.

Solution

What you're looking for is probably Black-Box optimization or Derivative-free optimization. The problem formulation is to optimize some function of which you don't know the analytical form (a black-box, you can only evaluate inputs to the function) and you don't have (direct) access to the derivatives. Note that these methods do not guarantee to find a global optimum.

There are several methods to do so, the most simple one works similar for floats as it does for your example with integers. You define a equally spaced grid, evaluate the function at every grid point and keep track of the maximum. However if you don't know the function bounds or your function is high-dimensional this method scales bad and may fail to find any "good" optimum. A slightly better approach is to define a non-uniform or random grid of points, this would be then called Monte Carlo method and it scales better in high-dimensions. A more sophisticated version of this inspired by physics is Simulated Annealing. However the problem of not knowing the bounds still remains.

A more complex approach would be Evolution Strategy algorithms like CMA-ES. In a nutshell, these class of algorithms propose a generation of points at every iteration step, select the best ones among those points and propose the next generation of points based on the selected ones. Often times this is done by proposing a probability function (e.g. Normal) and using maximum likelihood method with the selected points to optimize the parameters of those function. Advantage here is that you don't have to define or know the bounds of your function a priori.

A third way to optimize such a black-box function is Bayesian optimization, which is particular useful if evaluating the back-box function is expensive or when there is noise involved. The idea is to propose a prior function for the function to optimize, measure at certain points (this is the crucial part) and update our believe on how the real function looks like. This is done in an iterative fashion often by using Gaussian Processes as class of functions. To be honest, this method would be kinda overkill for your simple example.

To address your last point regarding Neural Networks: Neural Networks are function approximators, that is they can represent any function, mapping from some input space to some output space, the inputs and outputs are already given (data and labels). NNs are parametrized and have a loss function for which the analytical form is known (e.g. Mean-Squared-Error, Cross-Entropy, etc.). Therefore it is possible to use methods that access derivatives like Stochastic Gradient Descent to find the best parameters. In you case the function is already given (although no analytical form of it) and you like to find the input that maximizes it.