knapsack-problem constraint-programming minizinc

Best practice to solve large number of similar knapsack instances

I'm working on a project where I need to solve thousands of small to large "simple" instances of a knapsack-like problems. All my instances have the same structure, the same constraints but varies in the number of items (thus variables). The domains can be fixed for all the instances.

I've successfully integrated minizinc into a workflow which

extract relevant data from a database,
generate the .mzn model (including the initial variable assignments)
call the minizinc driver for the CBC solver and,
parse and interpret the solution.

I've now hit a performance bottleneck at the flattening phase for large instances (cf verbose flattening stats). Flattening takes up to 30seconds while solving to optimality requires less than 1s.

I've tried to disable the "optimized flattening" to no avail, I've also tried to split the toolchain (mzn2fzn then mzn-cbc); and finally tried to separate model and data definitions but didn't observe any significant improvement.

If it helps, I've identified the set of constraint causing the problem:

...
array[1..num_items] of int: item_label= [1,12,81,12, [10 000 more ints between 1..82], 53 ];

int: num_label = 82;
array[1..num_label] of var 0..num_items: cluster_distribution;
constraint forall(i in 1..num_label)(cluster_distribution[i]==sum(j in 1..num_items)(item_indicator[j] /\ item_label[j]==i));
var 1..num_label: nz_label;
constraint non_zero_label = sum(i in 1..num_label) (cluster_distribution[i]>0);
....

Basically, each of the 5000 items have a label (out of ~100 possible labels), and I try (amongst other objectives) to maximise the number of different labels (the non_zero_label var ). I've tried various formulation but this one seems to be the most efficient.

Could anyone provide me with some guidance on how to accelerate this flattening phase? How would you approach the task of solving thousands of "similar" instances?

Would it be beneficial to directly generate the MPS file and then call CBC natively? I expect minizinc to be quite efficient at compiling MPS file, but maybe I could get a speedup by exploiting the repeated structure of the instances? I however fear this would more or less boil down to re-coding a poorly written half-baked pseudo custom version of minizinc, which doesn't feel right.

Thanks !

My system info: Os X and Ubuntu, mnz 2.1.7.

Compiling instance-2905-gd.mzn
MiniZinc to FlatZinc converter, version 2.1.7
Copyright (C) 2014-2018   Monash University, NICTA, Data61
Parsing file(s) 'instance-2905-gd.mzn' ...
processing file '../share/minizinc/std/stdlib.mzn'
processing file '../share/minizinc/std/builtins.mzn'
processing file '../share/minizinc/std/redefinitions-2.1.1.mzn'
processing file '../share/minizinc/std/redefinitions-2.1.mzn'
processing file '../share/minizinc/linear/redefinitions-2.0.2.mzn'
processing file '../share/minizinc/linear/redefinitions-2.0.mzn'
processing file '../share/minizinc/linear/redefinitions.mzn'
processing file '../share/minizinc/std/nosets.mzn'
processing file '../share/minizinc/linear/redefs_lin_halfreifs.mzn'
processing file '../share/minizinc/linear/redefs_lin_reifs.mzn'
processing file '../share/minizinc/linear/domain_encodings.mzn'
processing file '../share/minizinc/linear/redefs_bool_reifs.mzn'
processing file '../share/minizinc/linear/options.mzn'
processing file '../share/minizinc/std/flatzinc_builtins.mzn'
processing file 'instance-2905-gd.mzn'
 done parsing (70 ms)
Typechecking ... done (13 ms)
Flattening ... done (16504 ms), max stack depth 14
MIP domains ...82 POSTs [ 82,0,0,0,0,0,0,0,0,0, ], LINEQ [ 0,0,0,0,0,0,0,0,82, ], 82 / 82 vars, 82 cliques, 2 / 2 / 2 NSubIntv m/a/m, 0 / 127.085 / 20322 SubIntvSize m/a/m, 0 clq eq_encoded  ...  done (28 ms)
Optimizing ... done (8 ms)
Converting to old FlatZinc ... done (37 ms)
Generated FlatZinc statistics:
Variables: 21258 int, 20928 float
Constraints: 416 int, 20929 float
    This is a minimization problem.
Printing FlatZinc to '/var/folders/99/0zvzbfcj3h16g04d07w38wrw0000gn/T/MiniZinc IDE (bundled)-RzF4wk/instance-2905-gd.fzn' ... done (316 ms)
Printing .ozn to '/var/folders/99/0zvzbfcj3h16g04d07w38wrw0000gn/T/MiniZinc IDE (bundled)-RzF4wk/instance-2905-gd.ozn' ... done (111 ms)
Maximum memory 318 Mbytes.
  Flattening done, 17.09 s

Solution

If you are solving many instantiations of the same problem class and you are running into large flattening times, then there are two general approaches you can take: either you optimise the MiniZinc to minimize flattening time or you can implement the model in a direct solver API.

The first option is the best if you want to keep the generality between solvers. To optimise the flattening time the main thing you would like to eliminate is "temporary" variables: variables that are created and then thrown away. A lot of the time in flattening the model goes into resolving variables that are not necessary. This is to minimize the search space while solving. Temporary variables might be generated in comprehensions, loops, let-expressions, and reifications. For your specific model Gleb posted an optimisation for the for-loop in your constraint:

constraint forall(i in 1..num_label)(cluster_distribution[i]==sum(j in 1..num_items where item_label[j]==i)(item_indicator[j]));

The other option might be best if you want to integrate your model into a software product as you can offer direct interactions with a solver. You will have to "flatten" your model/data manually, but you can do it in a more limited way specific to only your purpose. Because it is not general purpose it can be very quick. Hints for the model can be found by looking at the generated FlatZinc for different instances.