Search code examples
pythonnumpyindexingtensormulti-index

Numpy set all values to np.nan after index in multi-dimensional array


I have two numpy arrays - arr1 and arr2. arr2 contains index values for arr1. The shape of arr1 is (100, 8, 96, 192) and the shape of arr2 is (8, 96, 192). What I would like do is set all of the values in arr1 to np.nan after the index values in arr2.

For context, arr1 is time x model x lat x lon and all the indexes values in arr2 correspond to a point in time in the arr1 array. I would like to set the arr1 values at after the point in time in arr2 to np.nan.

Sample Data

arr1 = np.random.rand(*(100, 8, 96, 192))
arr2 = np.random.randint(low=0, high=80,size=(8, 96, 192))
in: print(arr1)

out: array([[[[0.61718651, 0.24426295, 0.9165573 , ..., 0.24155022,
          0.22327592, 0.9533857 ],
         [0.21922781, 0.87948651, 0.926359  , ..., 0.64281931,
         ...,
         [0.09342961, 0.29533331, 0.11398662, ..., 0.36239606,
          0.40228814, 0.87284515]]]])
in: print(arr2)

out: array([[[22,  5, 64, ...,  0, 37,  6],
        [71, 48, 33, ...,  8, 38, 32],
        [15, 41, 61, ..., 56, 32, 48],
        ...,
        ...,
        [66, 31, 32, ...,  0, 10,  6],
        [ 9, 28, 72, ..., 71, 29, 34],
        [65, 22, 50, ..., 58, 49, 35]]])

For reference I have previously asked this question which had some similarities. Numpy multi-dimensional index

Based upon this, I tried

arr1 = np.random.rand(100, 8, 96, 192)
arr2 = np.random.randint(low=0, high=80, size=(8, 96, 192))
I, J, K = np.indices((8, 96, 192), sparse=True)
out = arr1[arr2:, I, J, K]


TypeError: only integer scalar arrays can be converted to a scalar index

Also, perhaps similar to this in concept, but for much higher dimensional arrays Set values in numpy array to NaN by index


Solution

  • In this case, I would recommend indexing using a boolean mask with the same shape as arr1. Integer array advanced indexing like in your previous question is a lot harder here because for each model x lat x lon, a variable number of elements need to be indexed. Example:

    import numpy as np
    
    arr1 = np.random.rand(*(100, 8, 96, 192))
    arr2 = np.random.randint(low=0, high=80,size=(8, 96, 192))
    
    # These are the possible indices along the first axis in arr1
    # Hence shape (100, 1, 1, 1):
    idx = np.arange(100)[:, None, None, None]
    
    arr1[idx > arr2] = np.nan