Search code examples
pythonarraysnumpy

Set numpy array values past a certain index to 0 using another array


I have two arrays, one of floats and one of integers

arr1 = np.asarray([[1.5, 0.75, 0.2],
            [0.3, 1.8, 4.2]])
arr2 = np.asarray([2, 1])

I need to amend arr1 such that

arr1 [0, arr2[0]:] = 0
arr1 [1, arr2[1]:] = 0

obtaining

array([[1.5 , 0.75, 0.  ],
       [0.3 , 0.  , 0.  ]])

i.e. the nth element of arr2 dictates the number of non-zero elements in row n of arr1.

How can I vectorise this for large arrays, and extend this to further dimensions? Say

rng = np.random.default_rng()
arr1 = rng.uniform(size=(100, 10, 1000))
arr2 = rng.integers(10, size=(100, 1000))

how can I change arr1 (or create a new arr3 depending on implementation) such that every slice

arr1[:, :, i], arr2[:, i]

follows the above?

The final implementation will be on the order of 100,000 x 100 x 10,000.


Solution

  • I think the trick would be to create a mask using arr2 and an np.arange of numbers representing indices. In your toy example:

    >>> arr2[:, None] > np.arange(3)
    array([[ True,  True, False],
           [ True, False, False]])
    

    which creates a mask of which values to keep, and which values to replace with zeroes. You can then: arr3 = np.where(mask, arr1, 0), or you can directly modify arr1 in place.

    In your final example, I'm not entirely sure along which dimension you want to replace slices by zeroes, or what the final order means. So here is the trick in general:

    # Let's say we want to replace slices along the 'D' axis.
    arr1 = rng.uniform(size=(A, B, C, D, E, F))
    
    # The slice indices only make sense if we choose integers that are smaller than D
    arr2 = rng.integers(D, size=(A, B, C, E, F))
    
    # Need to insert an empty axis in `arr2` with respect to the dimension in `arr1`
    # along which you want to replace the slices.
    # We do the opposite for the `np.arange` indices.
    # That's because the broadcasting plays out nicely.
    mask = arr2[:, :, :, None, :, :] > np.arange(D)[None, None, None, :, None, None]
    
    arr3 = np.where(mask, arr1, 0)