Search code examples
pythonnumpy

Numpy array boolean indexing to get containing element


Given a (3,2,2) array how do I get second dimension elements given a single value on the third dimension

import numpy as np


arr = np.array([
 [[31.,  1.], [41.,  1.]],
 [[63.,  1.],[73.,  3.]],
 [[ 95.,   1.], [100., 1]]
 ]
)

ref = arr[(arr[:,:,0] > 41.) & (arr[:,:,0] <= 63)]
print(ref)

Result

[[63.  1.]]

Expected result

[[63.,  1.],[73.,  3.]]

The input value is 63 so I don't know in advance 73 exists but I want to return it as well. In other words, if value exists return the whole parent array without reshaping.

Another example

ref = arr[(arr[:,:,0] <= 63)]

Returns

[[31.  1.]
 [41.  1.]
 [63.  1.]]

But should return

[[[31.  1.]
 [41.  1.]]
 [[63.  1.]
 [73.  1.]]]

Solution

  • I think you want

    arr[((arr[:,:,0]>41)&(arr[:,:,0]<=63)).any(axis=1)]
    

    and

    arr[(arr[:,:,0] <= 63).any(axis=1)]
    

    Some explanation. First of all, a 3D array, is also a 2D array of 1D array, or a 1D array of 2D array.

    So, if the expected answer is an array of "whole parent", that is an array of 2D arrays (so a 3D array, with only some subarrays in it), you need a 1D array of booleans as index. Such as [True, False, False] to select only the 1st row, [[[31., 1.], [41., 1.]]], which is a bit the same as arr[[0]].

    arr[:,:,0]>41 is a 2D array of booleans. And therefore would pick individually some pairs (that is select elements along the 1st two axis). For example arr[[[True, False], [False, False], [False, False]]] selects only the 1st pair of the 1st subarray, [[31,1]]. A bit like arr[[0],[0]] would do.

    So, since you want something like [[[31., 1.], [41., 1.]]], not something like [[31,1]], we need to produce a 1D array of booleans, telling for each line (each subarray along axis 0) whether we want it or not.

    Now, the comments were about how to decide whether we want a subarray or not.

    If we start from ref=arr[:,:,0]<=63
    That is ref =

    array([[ True,  True],
           [ True, False],
           [False, False]])
    

    getting arr[ref] would select 3 pairs, which is not what you want (again, we don't want a 2D array of booleans as selector, since we want whole parents).

    Your attempt answer, is to use ref[:,0], which is a 1D-array of 3 booleans (the first of each row) : [True, True, False], which would select the 3 first rows.

    This answer, is to use ref[:,0], which is also a 1D-array of 3 booleans, each True iff one boolean at least of the row is True. So, also [True, True, False]

    Difference between our two answers shows with another example, used in comment.

    ref=arr[:,:,0]>97
    
    array([[False, False],
           [False, False],
           [False,  True]])
    

    if we use ref[:,0], that is [False, False, False], then answer is an empty array. Even the last row is not selected, tho it contains a value over 97. But we are only interested (if we say ref[:,0]) in rows whose 1st value of 1st pair is > 97

    If we use ref.any(axis=1), as in this answer, that is [False, False, True], we get the last row. Because this means that we are interested in any row whose at least one pair has a 1st value>97.

    We could also select rows whose only second pair has a 1st value>97 (arr[ref[:,1]]). Or rows whose one pair, but not both, has a 1st value>97 (arr[ref[:,0]^ref[:,1]]). Etc. Everything is possible. Point is, if we want to get a list of whole rows (subarrays along axis 0), then we need to build a 1D array of 3 booleans, deciding for each row if we want it all (True) or nothing (false)