Search code examples
pythonpandaslisttuplescomparison-operators

pandas comparision operator `==` not working as expected when column contain `List` instead `Tuple`


import pandas as pd
import numpy as np

df = pd.DataFrame({'Li':[[1,2],[5,6],[8,9]],'Tu':[(1,2),(5,6),(8,9)]}
df
       Li      Tu
0  [1, 2]  (1, 2)
1  [5, 6]  (5, 6)
2  [8, 9]  (8, 9)

Working fine for Tuple

df.Tu == (1,2)
0     True
1    False
2    False
Name: Tu, dtype: bool

When its List it gives value error

df.Li == [1,2]

ValueError: Lengths must match to compare


Solution

  • The problem is that pandas is considering [1, 2] as a series-like object and trying to compare each element of df.Li with each element of [1, 2], hence the error:

    ValueError: Lengths must match to compare

    You cannot compare a list of size two with a list of size 3 (df.Li). In order to verify this you can do the following:

    print(df.Li == [1, 2, 3])
    

    Output

    0    False
    1    False
    2    False
    Name: Li, dtype: bool
    

    It doesn't throw any error and works, but returns False for all as expected. In order to compare using list, you can do the following:

    # this creates an array where each element is [1, 2]
    data = np.empty(3, dtype=np.object)
    data[:] = [[1, 2] for _ in range(3)]
    
    print(df.Li == data)
    

    Output

    0     True
    1    False
    2    False
    Name: Li, dtype: bool
    

    All in all it seems like a bug in the pandas side.