Given an n by n matrix of integers, I want to find a part of the matrix that has the maximum sum with some restrictions. In the version I can solve, I am allowed to draw two lines and remove everything below and/or to the right. These can be one horizontal line and one diagonal line at 45 degrees (going up and right). For example with this 10 by 10 matrix:
[[ 1, -3, -2, 2, -1, -3, 0, -2, 0, 0],
[-1, 3, 3, -3, 0, -1, 0, 0, -2, -2],
[-1, 0, -1, 0, 2, 1, 1, -3, 2, 1],
[-3, 1, -3, -1, 1, -3, -2, -1, -3, 1],
[ 1, -3, 1, -2, 2, 1, -3, 2, -3, 0],
[-1, -2, 0, -2, 2, -3, 3, -1, -1, 2],
[ 2, 2, -3, -1, 0, -1, 2, 0, 3, 0],
[-1, 3, -1, 1, -1, 0, 0, 3, -3, 0],
[ 3, 2, 1, 1, 2, 3, 0, 2, 0, -3],
[ 0, 3, 2, 0, -1, -2, 3, -3, -3, 1]]
An optimal sum is 3 which you get by the shaded area here:
If square
contains the 2d array then this code will find the location of the places where the bottom row should end to get the maximum sum:
max_sums = np.empty_like(square, dtype=np.int_)
max_sums[0] = np.cumsum(square[0])
for row_idx in range(1, dim):
cusum = np.cumsum(square[row_idx])
for col_idx in range(dim):
if col_idx < dim - 1:
max_sums[row_idx, col_idx] = cusum[col_idx] + max_sums[row_idx - 1, col_idx + 1]
else:
max_sums[row_idx, col_idx] = cusum[col_idx] + max_sums[row_idx - 1, col_idx]
maxes = np.argwhere(max_sums==max_sums.max()) # Finds all the locations of the max
print(f"The coordinates of the maximums are {maxes} with sum {np.max(max_sums)}")
I would like to be able to leave out regions of the matrix defined by consecutive rows when finding the maximum sum. In the simpler version that I can solve I am only allowed to leave out rows at the bottom of the matrix. That is one region in the terminology we are using. The added restriction is that I am only allowed to leave out two regions at most. For the example given above, let us say we leave out the first two rows and rows 3 to 5 (indexing from 0) and rerun the code above. We then get a sum of 26.
Excluded regions must exclude entire rows, not just parts of them.
I would like to do this for much larger matrices. How can I solve this problem?
Here is a solution that will handle any number of alternating include/exclude regions. (But always in that order.)
def optimize_exclude(matrix, blocks=5):
"""Finds the optimal exclusion pattern for a matrix.
matrix is an nxm array of arrays.
blocks is the number of include/exclude blocks. We
start with an include block, so 5 blocks will include
up to three stretches of rows and exclude the other 2.
It returns (best_sum, diag, boundaries). So a return of
(25, 15, [5, 7, 10, 12]) represents that we excluded
all entries (i, j) where 5 <= i < 7, or 10 <= i < 12, or
15 < i+j.
It will run in time O(n*(n+m)*blocks).
"""
sums = []
# We start with a best which is excluding everything.
best = (0, -1, [])
n = len(matrix)
if 0 == n:
# Handle trivial edge case.
return best
m = len(matrix[0])
for diag in range(n + m - 1):
if diag < n:
# Need to sum a new row.
sums.append(0)
# Add the diagonal of i+j = diag to sums.
for i in range(max(0, diag-m+1), min(diag+1, n)):
j = diag - i
sums[i] += matrix[i][j]
# Start dp. It will be an array of,
# (total, (boundary, (boundary, ...()...))) with the
# current block being the position. We include even
# blocks, exclude odd ones.
#
# At first we can be on block 0, sum 0, no boundaries.
dp = [(0, ())]
if 1 < blocks:
# If we can have a second block, we could start excluding.
dp.append((0, (0, ())))
for i, s in enumerate(sums):
# Start next_dp with basic "sum everything."
# We do this so that we can assume next_dp[block]
# is always there in the loop.
next_dp = [(dp[0][0] + s, dp[0][1])]
for block, entry in enumerate(dp):
total, boundaries = entry
# Are we including or excluding this block?
if 0 == block%2:
# Include row?
if next_dp[block][0] < total + s:
next_dp[block] = (total + s, boundaries)
# Start new block?
if block + 1 < blocks:
next_dp.append((total, (i, boundaries)))
else:
# Exclude row?
if next_dp[block][0] < total:
next_dp[block] = entry
# Start new block?
if block + 1 < blocks:
next_dp.append((total + s, (i, boundaries)))
dp = next_dp
# Did we improve best?
for total, boundaries in dp:
if best[0] < total:
best = (total, diag, boundaries)
# Undo the linked list for convenience.
total, diag, boundary_link = best
reversed_boundaries = []
while 0 < len(boundary_link):
reversed_boundaries.append(boundary_link[0])
boundary_link = boundary_link[1]
return (total, diag, list(reversed(reversed_boundaries)))
And let's test it on your example.
matrix = [[ 1, -3, -2, 2, -1, -3, 0, -2, 0, 0],
[-1, 3, 3, -3, 0, -1, 0, 0, -2, -2],
[-1, 0, -1, 0, 2, 1, 1, -3, 2, 1],
[-3, 1, -3, -1, 1, -3, -2, -1, -3, 1],
[ 1, -3, 1, -2, 2, 1, -3, 2, -3, 0],
[-1, -2, 0, -2, 2, -3, 3, -1, -1, 2],
[ 2, 2, -3, -1, 0, -1, 2, 0, 3, 0],
[-1, 3, -1, 1, -1, 0, 0, 3, -3, 0],
[ 3, 2, 1, 1, 2, 3, 0, 2, 0, -3],
[ 0, 3, 2, 0, -1, -2, 3, -3, -3, 1]]
for row in matrix:
print(row)
print(optimize_exclude(matrix))
As we hope, it finds the solution (26, 15, [0, 2, 3, 6])
. Which corresponds to a sum of 26, a diagonal of 15 (so exclude when 15 < i+j
- hits the end of the matrix by excluding from matrix[9][7]
on), and excluding rows 0,1
then rows 3,4,5
. All of which exactly matches your diagram.