Search code examples
pythonpandasnumpymagic-methods

Equality Comparison with NumPy Instance Invokes `__bool__`


I have defined a class where its __ge__ method returns an instance of itself, and whose __bool__ method is not allowed to be invoked (similar to a Pandas Series).

Why is X.__bool__ invoked during np.int8(0) <= x, but not for any of the other examples? Who is invoking it? I have read the Data Model docs but I haven’t found my answer there.

import numpy as np
import pandas as pd

class X:
    def __bool__(self):
        print(f"{self}.__bool__")
        assert False
    def __ge__(self, other):
        print(f"{self}.__ge__")
        return X()

x = X()

np.int8(0) <= x

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D90>.__bool__
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "<stdin>", line 4, in __bool__
# AssertionError

0 <= x

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5DF0>

x >= np.int8(0)

# Console output:
# <__main__.X object at 0x000001BAC70D5C70>.__ge__
# <__main__.X object at 0x000001BAC70D5D30>


pd_ge = pd.Series.__ge__
def ge_wrapper(self, other):
    print("pd.Series.__ge__")
    return pd_ge(self, other)

pd.Series.__ge__ = ge_wrapper

pd_bool = pd.Series.__bool__
def bool_wrapper(self):
    print("pd.Series.__bool__")
    return pd_bool(self)

pd.Series.__bool__ = bool_wrapper


np.int8(0) <= pd.Series([1,2,3])

# Console output:
# pd.Series.__ge__
# 0    True
# 1    True
# 2    True
# dtype: bool

Solution

  • TL;DR

    X.__array_priority__ = 1000


    The biggest hint is that it works with a pd.Series.

    First I tried having X inherit from pd.Series. This worked (i.e. __bool__ no longer called).

    To determine whether NumPy is using an isinstance check or duck-typing approach, I removed the explicit inheritance and added (based on this answer):

    @property
    def __class__(self):
        return pd.Series
    

    The operation no longer worked (i.e. __bool__ was called).

    So now I think we can conclude NumPy is using a duck-typing approach. So I checked to see what attributes are being accessed on X.

    I added the following to X:

    def __getattribute__(self, item):
        print("getattr", item)
        return object.__getattribute__(self, item)
    

    Again instantiating X as x, and invoking np.int8(0) <= x, we get:

    getattr __array_priority__
    getattr __array_priority__
    getattr __array_priority__
    getattr __array_struct__
    getattr __array_interface__
    getattr __array__
    getattr __array_prepare__
    <__main__.X object at 0x000002022AB5DBE0>.__ge__
    <__main__.X object at 0x000002021A73BE50>.__bool__
    getattr __array_struct__
    getattr __array_interface__
    getattr __array__
    Traceback (most recent call last):
      File "<stdin>", line 32, in <module>
        np.int8(0) <= x
      File "<stdin>", line 21, in __bool__
        assert False
    AssertionError
    

    Ah-ha! What is __array_priority__? Who cares, really. With a little digging, all we need to know is that NDFrame (from which pd.Series inherits) sets this value as 1000.

    If we add X.__array_priority__ = 1000, it works! __bool__ is no longer called.

    What made this so difficult (I believe) is that the NumPy code didn't show up in the call stack because it is written in C. I could investigate further if I tried out the suggestion here.