algorithm machine-learning mathematical-optimization bayesian ab-testing

Bandit-like Algorithm to Optimize Parameters?

I need an algorithm to optimize the time of the week that I show a message to a user to ensure the highest probability that the user will click the message.

When the message is shown, a database entry will be updated with the day/time and whether or not the user clicked. The goal is to maximize the click through rate.

I am very comfortable with using Bayesian Bandits (also known as Thompson Sampling) (https://github.com/omphalos/bayesian-bandit.js) to optimize in the case of N discrete parameters, but I am at a loss on how to apply this to a continuous value.

I'm well aware of standard hill climbing algorithms, but I only understand how to apply hill-climbing in the absence of statistical noise. Is there a simple bayesian way to do hill climbing that optimizes the exploration/exploitation tradeoff?

For bonus points, is there a method that can be generalized to multiple dimensions, so optimizing multiple parameters simultaneously to find optimal points in a multi-dimensional space?

Solution

I suggest you view the reward function as a Gaussian Process to make this nice and Bayesian in the presence of continuous parameters. Essentially you have a regression problem where payoff(t) is a function to be estimated for continuous t, and you want a strategy for picking values of t which trades off exploration (regions of function space with high posterior variance) with exploitation (regions of function space with high expectation).

There's prior work on this, for example this paper and other works by the author.