Search code examples
pythonmathematical-optimization

Minimising a slow, noisy, not well-defined target function


My question is: are there minimisation algorithms, preferably implemented in Python, that can operate on a function that is slow (~1-10s) and is taking data from a live system, that would not take more than a couple of hours to complete?

I have an FPGA that runs a filter over some sensor data, and uses the output of this filter to improve the performance of another device. I would like to find the optimal filter. My attempts at modelling the system and using various signal processing techniques did not produce adequate results, so now I'm going to attempt to solve this on the live system itself (if anything, just to prove that such an optimal filter is possible).

The filter can be programmed over the serial line, and the performance of the other device can be measured over the serial line.

So I can construct a function which:

  • Takes parameters that define a filter
  • Programs the filter via the serial line
  • Acquires data via the serial line
  • Computes a measure of how good the filter is (in the sense that smaller is better)

This means I have a function that can be used as a target for minimisation. Here are the problems though:

It's slow

To program the filter takes about 1.5s, to acquire the data to measure the goodness of the filter takes about 6s. All up, that's nearly 8s per function call. In other words, calling it just 500 times would take more than an hour. Even speeding up the communications and computation would probably not change this by an order of magnitude.

It's not well defined

(Note that x below is a vector in the parameter space of my target function.)

To put it simply, x1 == x2 does not imply f(x1) == f(x2). Due to the noisiness of the system, sampling the target function f(x) at the same point in its parameter space could yield different results due to the noisiness of the system.

The first thing that occurred to me was to have the target function actually average several measurements, and increase the tolerance value of whatever minimisation routine I'm running. But in looking at the actual numbers, in the worst case I could have the (mean) value of f(x) change by 2.0 over the full range of parameters, but the sample standard deviation is 1.6. This means that if I want to reduce the standard error (s/sqrt(n)) to, say, 0.1 I'd need to measure the same point 250 times, which makes each measurement take 30 minutes. Yay.

There are tricks I can pull to improve this, say to get a swing of ~20 over the parameter range with a standard deviation of 0.25 at any given point. But these tricks have other tradeoffs in time.

Concessions

On the bright side, plotting the function (greatly averaged) over the whole optimisation space (which I've done to confirm that there is indeed a global minimum) shows that the thing is actually reasonably smooth and the minimum value is not too sharp. The other bright side is that the metric only needs to be optimised to two or three significant figures. If it were not so slow, optimising it would be easy.

I've started looking at the minimisation routines in SciPy, but since many of the parameters are undocumented or interdependent, it's a bit of a walk in the dark (with each step taking several hours).

It strikes me that what I really need is an optimisation algorithm that is known to work in the least number of function calls; although maybe there's another approach that I haven't considered.


Solution

  • I think that this is a reasonable use case for a Metropolis optimization. This is one of the early examples of a Markov Chain Monte Carlo and can be applied more or less unchanged to your use case.

    On each step you propose a random step in your parameter space and define the fitness as exp(-(1/thing_to_minimize)). Accept any proposed step where the fitness has grown and others randomly at a fraction current_fitness/previous_fitness. After it's been running for a while simple start averaging the location in parameter space.

    You can add a simulated annealing aspect by reducing the mean step size as a function of time for an extra frill.


    I've written about this on Stack Overflow a few times before, but you'll find my most complete description on Software to Tune/Calibrate Properties for Heuristic Algorithms.