Search code examples
pythonnumpymask

Slice 2D array using mask


Assume an array of

0 = {ndarray: (4,)} [5 0 3 3]
1 = {ndarray: (4,)} [7 9 3 5]
2 = {ndarray: (4,)} [2 4 7 6]
3 = {ndarray: (4,)} [8 8 1 6]

I would like slice index where epoch_label is equal to zero

[1 1 0 0]

From above, the index will be the second and third index

  • Remarks: epoch_label is an integer, and the value can be 0,1,2,...

Using masked_where, this will produce something as

[1 1 -- --]

And, the expected output should be

[2 4 7 6]
[8 8 1 6]

However, using the code below

epoch_com = [np.random.randint(10, size=4) for _ in range(Nepochs)]
epoch_com_arr=np.array(epoch_com)
epoch_label=np.random.randint(2, size=Nepochs)
mm=ma.masked_where(epoch_label == 0, epoch_label)
expected_output=np.where(epoch_com_arr[mm,:])

The above snippet code produce

0 = {ndarray: (14,)} [0 0 0 0 1 1 1 1 2 2 2 3 3 3]
1 = {ndarray: (14,)} [0 1 2 3 0 1 2 3 0 2 3 0 2 3]

which is not as per I intend

or

expected_output=epoch_com_arr[mm,:]

which produced

0 = {ndarray: (4,)} [7 9 3 5]
1 = {ndarray: (4,)} [7 9 3 5]
2 = {ndarray: (4,)} [5 0 3 3]
3 = {ndarray: (4,)} [5 0 3 3]

May I know how to solve this


Solution

  • With

     In [242]: Nepochs = 4
     ...: epoch_com = [np.random.randint(10, size=4) for _ in range(Nepochs)]
     ...: epoch_com_arr=np.array(epoch_com)
     ...: epoch_label=np.random.randint(2, size=Nepochs)
     ...: mm=np.ma.masked_where(epoch_label == 0, epoch_label)
     ...: expected_output=np.where(epoch_com_arr[mm,:])
    

    Looking at the variables:

    In [246]: epoch_com_arr       # a (4,4) array
    Out[246]: 
    array([[7, 1, 3, 3],
           [5, 6, 7, 8],
           [5, 6, 3, 8],
           [3, 5, 1, 1]])
    

    I don't know why you are using the "0 = {ndarray: (4,)} [5 0 3 3]" style of display. It isn't normal numpy.

    I don't think making a masked_array has any benefit:

    In [247]: epoch_label
    Out[247]: array([0, 0, 1, 0])
    In [248]: mm
    Out[248]: 
    masked_array(data=[--, --, 1, --],
                 mask=[ True,  True, False,  True],
           fill_value=999999)
    

    Instead just convert the 0/1 to boolean. Often when we talk about 'masking' we mean using using a boolean array as index, not the use of np.ma.

    In [249]: epoch_label.astype(bool)
    Out[249]: array([False, False,  True, False])
    

    That boolean can be used to select rows of arr, or alternatively 'deselect' them:

    In [250]: epoch_com_arr[epoch_label.astype(bool),:]
    Out[250]: array([[5, 6, 3, 8]])
    In [251]: epoch_com_arr[~epoch_label.astype(bool),:]
    Out[251]: 
    array([[7, 1, 3, 3],
           [5, 6, 7, 8],
           [3, 5, 1, 1]])
    

    I don't think the np.where is useful here. That gives the indices of the nonzero terms in epoch_com_arr[mm,:], and indexing with the np.ma` array is questionable.

    np.where could be used convert epoch_label into an index:

    In [252]: idx = np.nonzero(epoch_label)   # aka np.where
    In [253]: idx
    Out[253]: (array([2]),)
    In [254]: epoch_com_arr[idx,:]
    Out[254]: array([[[5, 6, 3, 8]]])