Search code examples
pythonpython-3.xpandasdataframedata-manipulation

Simplify Boolean Indexing Conditions in Pandas


I have two columns which either contain a "Yes" or "No" value:

  • Partner
  • Dependents

I originally wrote this code to generate another column based off the value in those two columns (see below):

conditions = [
    # either dependents, partner, or both 
    ((cleaned_df["Partner"] == "Yes") & (cleaned_df["Dependents"] == "Yes")) | ((cleaned_df["Partner"] == "No") & (cleaned_df["Dependents"] == "Yes")) | ((cleaned_df["Partner"] == "Yes") & (cleaned_df["Dependents"] == "No")),
    # neither partner nor dependents
    (cleaned_df["Partner"] == "No") & (cleaned_df["Dependents"] == "No")
]

However, the first condition is a bit wordy and I was wondering if there was a more eloquent way to rewrite this code. Thanks in advance!


Solution

  • I will take a minimum example:

    import pandas as pd
    data = {'Partner' : ['Yes','No','Yes','No'] , 'Dependents': ['No','No','Yes','Yes']}
    df = pd.DataFrame(data)
    
    print(df)
    Partner Dependents
    0   Yes No
    1   No  No
    2   Yes Yes
    3   No  Yes
    

    You can simply do:

    (df["Partner"] == "Yes") | (df["Dependents"] == "Yes")
    
    0     True
    1    False
    2     True
    3     True
    dtype: bool
    

    Explanation: If any one is True then it will be True, if both of them are False then it will be False.