I have two dataframes df1
, df2
as follows
>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]])
>>> df2 = pd.DataFrame([1,2,3,4,5,6,7,8])
>>> df1
0 1
0 1 2
1 3 4
2 5 6
3 7 8
>>> df2
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
When trying to check if 1
is in df1
, it yields True as expected.
>>> any(df1 == 1)
True
However, when trying the same on df2
, I get, unexpectedly, False
>>> any(df2 == 1)
False
Despite that from a boolean perspective everything seems right.
>>> df1 == 1
0 1
0 True False
1 False False
2 False False
3 False False
>>> df2 == 1
0
0 True
1 False
2 False
3 False
4 False
5 False
6 False
7 False
>>>
Any ideas on why is that?
PS: I am not asking about the built in any function in pandas. I am just puzzled with the behavior of any.
You need to use pandas built in any
instead of any
from base Python:
df1.eq(1).any().any()
# True
df2.eq(1).any().any()
# True
When using any
from python, it treats the data frame as an iterable/dictionary and thus only check the column names, without looking at the values of the data frame; If you simply loop through df1
and df2
, you can see it only returns the column names, which is how a dictionary behaves; Since df1
contains column names of 0
and 1
, any([0,1])
will return True
; df2
, on the other hand, contains only one column of [0]
, any([0])
returns False
. So any(df == 1)
is somewhat equivalent to any(df)
or any(df.columns)
:
[x for x in df1]
# [0, 1]
[x for x in df2]
# [0]