In pure Python, None or True
returns True
.
However with pandas when I'm doing a |
between two Series containing None values, results are not as I expected:
>>> df.to_dict()
{'buybox': {0: None}, 'buybox_y': {0: True}}
>>> df
buybox buybox_y
0 None True
>>> df['buybox'] = (df['buybox'] | df['buybox_y'])
>>> df
buybox buybox_y
0 False True
Expected result:
>>> df
buybox buybox_y
0 True True
I get the result I want by applying the OR operation twice, but I don't get why I should do this.
I'm not looking for a workaround (I have it by applying df['buybox'] = (df['buybox'] | df['buybox_y'])
twice in a row) but an explanation, thus the 'why' in the title.
Pandas |
operator does not rely on Python or expression
, and behaves differently.
If both operands are boolean, the result is mathematically defined, and the same for Python and Pandas.
But in your case series "buybox" is of type object
, and "buybox_y" is bool
. In this case Pandas |
operator is not commutative:
bitwise or
is attempted
None | True
is invalid operation, resulting in None
Thus,
>>> df['buybox'] | df['buybox_y']
0 False
>>> df['buybox_y'] | df['buybox']
0 True
For predictable results, you can clean up data, and cast to boolean type with Pandas astype
before attempting boolean operations.