OptaPlanner Scalability

I need to decrease the latency of obtaining results from Optaplanner. Is it possible to run an job across multiple instances (and/or machines) in a cluster? I couldn't seem to find any information/attempt on achieving this.

Solution

First use the benchmarker (see docs chapter) and look at the BEST_SCORE graphs, this will give you a lot of insight. Furthermore, as you try the techniques below, it allows you to objectively compare their usefulness.

In the benchmarker report, look at the average score calculation count per second. If it's below 1 000, it's terrible. If it's above 10 000, it's good. To improve it, see docs chapter about stepLimit benchmarking to figure out which score constraint (= score rule in DRL) is the bottleneck.
If the Construction Heuristic (CH) is taking too long, configure the CH's MoveSelectors explicitly (see docs chapter about advanced CH configuration) and do a limited selection. This can reduce the CH's from seconds to below a second even with 10000 entities, at a small cost to the resulting score. Especially with 2 or more variables per entity, limited selection can be a great gain. The cost on the resulting score can be
If it's VRP or TSP, use nearbySelection to scale.

We're working on adding single-tenant multi-vm parallel solving (note that multi-tenant multi-vm parallel solving is already possible, if you do it yourself).