Given a matrix of scores, I'd like to select exactly n elements from each column and each row such that the total score of the selected elements across the entire matrix will be as high as possible.
Example: given the cost matrix
array([[0.65500799, 0.79214695, 0.39854742],
[0.53634974, 0.3682463 , 0.99663978],
[0.73423119, 0.87150676, 0.80823699]])
The optimal selection for n=1 is:
array([[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]])
the total score of this solution is 0.65500799+0.87150676+0.99663978
The optimal selection for n=2 is:
array([[1., 1., 0.],
[1., 0., 1.],
[0., 1., 1.]])
the total score of this solution is 0.65500799+0.53634974+0.79214695+0.87150676+0.99663978+0.80823699
These solutions were obtained by a naive Breadth-First Search (BFS). However, this approach isn't computationally feasible (run time explodes) for larger problems (e.g., 10x10, n=2).
Questions:
Here is a solution based on integer programming (IP).
Decision variables: x[i,j] = 1
if we select the item in row i
, column j
.
Parameters (inputs): s[i,j] =
score for entry (i
, j
)
Formulation:
maximize sum {i, j} s[i,j] * x[i,j]
subject to sum {i} x[i,j] = n for all j
sum {j} x[i,j] = n for all i
x[i,j] in {0,1} for all i, j
You can implement this in Python/PuLP
or a solver-specific package such as gurobipy
or docplex
. I would expect that these solvers can solve even moderately large instances of the problem, to optimality (not heuristically), within a fraction of a second.