Search code examples
pythonpython-3.xnumpymultidimensional-arrayboolean-operations

Unexpected behavior of boolean operations in NumPy ndarray inline comparisons


I find that attempting to perform multiple boolean comparisons on numpy ndarrays using &, |, ==, >=, etc. often gives unexpected results, where the pure python order of operations seems on the surface to be violated (I was wrong about this; for example, True | False==True yields True). What are the "rules" or things going on under the hood that explain these results? Here are a few examples:

  1. Comparing a boolean ndarray to the results of an elementwise comparison on a non-boolean ndarray:

    In [36]: a = np.array([1,2,3])
    In [37]: b = np.array([False, True, False])
    In [38]: b & a==2 # unexpected, with no error raised!
    Out[38]: array([False, False, False], dtype=bool)
    
    In [39]: b & (a==2) # enclosing in parentheses resolves this
    Out[39]: array([False,  True, False], dtype=bool)
    
  2. Elementwise &/| on boolean and non-boolean ndarrays:

    In [79]: b = np.array([True,False,True])
    
    In [80]: b & a # comparison is made, then array is re-cast into integers!
    Out[80]: array([1, 0, 1])
    
  3. Finding elements of array within two values:

    In [47]: a>=2 & a<=2 # have seen this in different stackexchange threads
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    
    In [48]: (a>=2) & a<=2 # similar to behavior in In[38], but instead get *True* boolean array of
    Out[48]: array([ True,  True,  True], dtype=bool)
    
    In [49]: (a>=2) & (a<=2) # expected results
    Out[49]: array([False,  True, False], dtype=bool)
    
  4. Logical &/| yielding results not in or [0,1] (which would be expected if a boolean result was coerced back into int).

    In [90]: a & 2
    Out[90]: array([0, 2, 2])
    

I welcome additional examples of this behavior.


Solution

  • I think you are confused about the precedence of the & | binary operators vs the comparison operators:

    >>> import dis
    >>> dis.dis("b & a==2")
      1           0 LOAD_NAME                0 (b)
                  2 LOAD_NAME                1 (a)
                  4 BINARY_AND
                  6 LOAD_CONST               0 (2)
                  8 COMPARE_OP               2 (==)
                 10 RETURN_VALUE
    

    You can see here that BINARY_AND is done first (between b and a) then the result is compared against 2 which, since it is a boolean array, is all False

    The reason & and | have lower precedence is because they are not intended as logical operators, it represents the binary (math?) operation which numpy happens to use for logic, for example with ints I'd definitely expect the & to happen first:

    if 13 & 7 == 5:
    

    It is unfortunate that numpy cannot override the behaviour of the logical and and or operators since their precedence makes sense as logical operators but unfortunately they cannot be overridden so we just have to live will adding lots of brackets when doing boolean arrays.

    Note that there was a proposal to allow and or to be overloaded but was not passed since basically it would only be a small convinience for numpy while making all other strict boolean operations slower.