While working on a topic involving the bitwise AND operator I stumbled over the below occurrence.
Accessing the Series of the Pandas DataFrames and performing the same conditional check, the returned result differs.
In [91]: df = pd.DataFrame({"h": [5300, 5420, 5490], "l": [5150, 5270, 5270]})
In [92]: df
Out[92]:
h l
0 5300 5150
1 5420 5270
2 5490 5270
In [93]: df2 = pd.DataFrame({"h": [5300.1, 5420.1, 5490.1], "l": [5150.1, 5270.1, 5270.1]})
In [94]: df2
Out[94]:
h l
0 5300.1 5150.1
1 5420.1 5270.1
2 5490.1 5270.1
In [95]: df["h"].notna() & df["l"]
Out[95]:
0 False
1 False
2 False
dtype: bool
In [96]: df2["h"].notna() & df2["l"]
Out[96]:
0 True
1 True
2 True
dtype: bool
In [97]:
You've hit some weird implicit casting. I believe what you mean is:
df["h"].notna() & df["l"].notna()
or perhaps
df["h"].notna() & df["l"].astype(bool)
In the original,
df["h"].notna() & df["l"]
you have requested a bitwise operation on two Series, the first of which is dtyped as boolean and the second of which is either integer (in df) or float (in df2).
In the first case, a boolean can be upcast to an int. It appears that what has happened is that the boolean True is upcast to the integer 1 (binary 0000000001), bitwise-anded with the integers 5150, 5270, and 5270, (which gives 0, since all of those are even). E.g. if you set
df.loc[2, 'l'] = 5271
you will see that the final value changes to True.
In the case of df2, a float and a bool cannot be logically anded together. It appears that Pandas here may be implicitly converting the dtype of the float array to bool. numpy itself would not do this:
In [79]: np.float64([.1, .2]) & np.array([True, True])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-2c2e50f0bf99> in <module>
----> 1 np.float64([.1, .2]) & np.array([True, True])
TypeError: ufunc 'bitwise_and' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
But pandas seems to allow it:
In [88]: pd.Series([True, True, True]) & pd.Series([0, .1, .2])
Out[88]:
0 False
1 True
2 True
dtype: bool
The same results in numpy can be achieved by using astype bool explicitly:
In [92]: np.array([True, True, True]) & np.float64([0, .1, .2]).astype(bool)
Out[92]: array([False, True, True])