Search code examples
pandaspython-3.7

Pandas : Comparing more than 3 columns fails


In pandas, I wanted to compare only 3 columns(chosen by name), of the total 8 columns, and get the "Outcome".

  • [You will find many similar questions, but 99% of them are irrelavent as they are comparing all the columns in the dataframe, and not just random ones from a larger dataset as it happens in the real world analysis... I want to choose the columns by name which have to be compared]
# Columns to compare are  ::  ColB, ColD and ColF


Fruits  ColA    ColB    ColC    ColD    ColE    ColF    Outcome
Loquat  83  98  91  98  78  96  FALSE
Medlar  82  94  87  94  91  94  TRUE
Pear    77  74  79  71  79  71  FALSE
Quince  71  93  78  93  92  93  TRUE
Date    98  81  73  94  97  99  FALSE
Rowan   89  85  77  85  95  85  TRUE
Lime    97  91  71  90  88  85  FALSE

Is there any code which can help me compare more than 2 Columns at a time, and get a boolean? (I know comparing 2 columns works with the below code, but if I add a third column it gives error shown at the end)

#  I have tried the below code:

df.loc[(df['ColB']==df['ColD']==df['ColF']), 'Outcome'] = "True"

Traceback (most recent call last):

File "C:\Py378\Tests\Trial.py", line 15, in <module>
    df.loc[(df['ColB']==df['ColD']==df['ColF']), 'Outcome'] = "True"

  File "c:\py378\py\lib\site-packages\pandas\core\generic.py", line 1479, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The above would have worked if I removed "==df['ColF']" from it, so I know comparing 2 columns works... Is there any format in which I can add columns by name(more than 3 to 5) and it will work?


Solution

  • Try this:

    df.loc[(df['ColB']==df['ColD']) & (df['ColD']==df['ColF']), 'Outcome'] = "True"