Search code examples
pythonsearchcomputation-theoryinteger-programmingdiscrete-optimization

Discrete optimization - selecting exactly N items from each row and column of a score matrix


Given a matrix of scores, I'd like to select exactly n elements from each column and each row such that the total score of the selected elements across the entire matrix will be as high as possible.

Example: given the cost matrix

array([[0.65500799, 0.79214695, 0.39854742],
       [0.53634974, 0.3682463 , 0.99663978],
       [0.73423119, 0.87150676, 0.80823699]])

The optimal selection for n=1 is:

array([[1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.]])

the total score of this solution is 0.65500799+0.87150676+0.99663978

The optimal selection for n=2 is:

array([[1., 1., 0.],
       [1., 0., 1.],
       [0., 1., 1.]])

the total score of this solution is 0.65500799+0.53634974+0.79214695+0.87150676+0.99663978+0.80823699

These solutions were obtained by a naive Breadth-First Search (BFS). However, this approach isn't computationally feasible (run time explodes) for larger problems (e.g., 10x10, n=2).

Questions:

  1. How is this discrete optimization problem classified?
  2. What heuristics can allow finding good solutions for this problem quickly?
  3. What Python libararies implement these heuristics?

Solution

  • Here is a solution based on integer programming (IP).

    Decision variables: x[i,j] = 1 if we select the item in row i, column j.

    Parameters (inputs): s[i,j] = score for entry (i, j)

    Formulation:

    maximize sum {i, j} s[i,j] * x[i,j]
    subject to sum {i} x[i,j] = n     for all j
               sum {j} x[i,j] = n     for all i
               x[i,j] in {0,1}        for all i, j
    

    You can implement this in Python/PuLP or a solver-specific package such as gurobipy or docplex. I would expect that these solvers can solve even moderately large instances of the problem, to optimality (not heuristically), within a fraction of a second.