Search code examples
pythonpandasany

Inconsistent behavior of any(df == value) on pandas dataframe


I have two dataframes df1, df2 as follows

>>> df1 = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]]) 
>>> df2 = pd.DataFrame([1,2,3,4,5,6,7,8]) 
>>> df1
   0  1
0  1  2
1  3  4
2  5  6
3  7  8
>>> df2 
   0
0  1
1  2
2  3
3  4
4  5
5  6
6  7
7  8

When trying to check if 1 is in df1, it yields True as expected.

>>> any(df1 == 1) 
True

However, when trying the same on df2, I get, unexpectedly, False

>>> any(df2 == 1)
False

Despite that from a boolean perspective everything seems right.

>>> df1 == 1
       0      1
0   True  False
1  False  False
2  False  False
3  False  False
>>> df2 == 1
       0
0   True
1  False
2  False
3  False
4  False
5  False
6  False
7  False
>>> 

Any ideas on why is that?

PS: I am not asking about the built in any function in pandas. I am just puzzled with the behavior of any.


Solution

  • You need to use pandas built in any instead of any from base Python:

    df1.eq(1).any().any()
    # True
    
    df2.eq(1).any().any()
    # True
    

    When using any from python, it treats the data frame as an iterable/dictionary and thus only check the column names, without looking at the values of the data frame; If you simply loop through df1 and df2, you can see it only returns the column names, which is how a dictionary behaves; Since df1 contains column names of 0 and 1, any([0,1]) will return True; df2, on the other hand, contains only one column of [0], any([0]) returns False. So any(df == 1) is somewhat equivalent to any(df) or any(df.columns):

    [x for x in df1]
    # [0, 1]
    
    [x for x in df2]
    # [0]