Search code examples
pythonarraysnumpyindicesnumpy-slicing

How can I write zeros to a 2D numpy array by both row and column indices


I have a large (90k x 90k) numpy ndarray and I need to zero out a block of it. I have a list of about 30k indices that indicate which rows and columns need to be zero. The indices aren't necessarily contiguous, so a[min:max, min:max] style slicing isn't possible.

As a toy example, I can start with a 2D array of non-zero values, but I can't seem to write zeros the way I expect.

import numpy as np

a = np.ones((6, 8))
indices = [2, 3, 5]
# I thought this would work, but it does not.
# It correctly writes to (2,2), (3,3), and (5,5), but not all
# combinations of (2, 3), (2, 5), (3, 2), (3, 5), (5, 2), or (5, 3)
a[indices, indices] = 0.0
print(a)

[[1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 0. 1. 1. 1. 1. 1.]
 [1. 1. 1. 0. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 0. 1. 1.]]
# I thought this would fix that problem, but it doesn't change the array.
a[indices, :][:, indices] = 0.0
print(a)

[[1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]]

In this toy example, I'm hoping for this result.

[[1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 0. 0. 1. 0. 1. 1.]
 [1. 1. 0. 0. 1. 0. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 0. 0. 1. 0. 1. 1.]]

I could probably write a cumbersome loop or build some combinatorically huge list of indices to do this, but it seems intuitive that this must be supported in a cleaner way, I just can't find the syntax to make it happen. Any ideas?


Solution

  • Based on hpaulj's comment, I came up with this, which works perfectly on the toy example.

    a[np.ix_(indices, indices)] = 0.0
    print(a)
    
    [[1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 0. 0. 1. 0. 1. 1.]
     [1. 1. 0. 0. 1. 0. 1. 1.]
     [1. 1. 1. 1. 1. 1. 1. 1.]
     [1. 1. 0. 0. 1. 0. 1. 1.]]
    

    It also worked beautifully on the real data. It was faster than I expected and didn't noticeably increase memory consumption. Exhausting memory has been a constant concern with these giant arrays.