python numpy numpy-ndarray array-broadcasting numpy-slicing

Adding block-constant matrices in numpy

Let's say we have a

n*n matrix A
m*m matrix B
a vector b=[b_1,...,b_m] of block sizes with b_1 + ... + b_m = n and b_i >= 1.

Then I define a n*n block matrix B_block whose (i,j)-th block is a b_i*b_j matrix of constant value B_{ij}. How can I efficiently compute A+B_block in numpy?

So far I am doing

A = np.arange(36).reshape(6,6)
B = np.arange(9).reshape(3,3)
blocks_sizes = np.array([3,2,1])
B_block = np.block([[np.ones((a,b))*B[i,j] for j,b in enumerate(block_sizes)] for i,a in enumerate(block_sizes)])
C = A + B_block
print(A)
print(B_block)
print(C)

resulting in

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 31 32 33 34 35]]
[[0. 0. 0. 1. 1. 2.]
 [0. 0. 0. 1. 1. 2.]
 [0. 0. 0. 1. 1. 2.]
 [3. 3. 3. 4. 4. 5.]
 [3. 3. 3. 4. 4. 5.]
 [6. 6. 6. 7. 7. 8.]]
[[ 0.  1.  2.  4.  5.  7.]
 [ 6.  7.  8. 10. 11. 13.]
 [12. 13. 14. 16. 17. 19.]
 [21. 22. 23. 25. 26. 28.]
 [27. 28. 29. 31. 32. 34.]
 [36. 37. 38. 40. 41. 43.]]

which works fine but seems inefficient. Is there a numpy-solution which does not require constructing the block matrix by hand, or python-loops?

If all the blocks were of the same size, then I could use np.kron...

Solution

B_block = B[np.arange(len(blocks_sizes)).repeat(blocks_sizes)][:, np.arange(len(blocks_sizes)).repeat(blocks_sizes)]
C = A + B_block

my performance:

4.36 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

your performance:

83.1 µs ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If you prefer you can also define slice_ids = np.arange(len(blocks_sizes)).repeat(blocks_sizes) in order to make the lighter and so:

B_block = B[slice_ids][:, slice_ids]
C = A + B_block