Search code examples
pythonpandasiterable-unpacking

How set value on dataframe given a variable number of conditions?


from itertools import product
import pandas as pd

animals = ["dogs", "cats"]
eyes = ['brown', 'blue', 'green']
height = ['short', 'average', 'tall']
a = [animals, eyes, height]
df = pd.DataFrame(list(product(*a)), columns=["animals", "eyes", "height"])
df['value'] = 1

Output:

   animals   eyes   height  value
0     dogs  brown    short      1
1     dogs  brown  average      1
2     dogs  brown     tall      1
3     dogs   blue    short      1
4     dogs   blue  average      1
5     dogs   blue     tall      1
6     dogs  green    short      1

Question: How do I create a single function such that zeros "value" in one or many rows given one or many conditions?

Examples:

# This would change all the 1s into 0s for all dogs with blue eyes.
zero_out(df, [("animals", "dogs"), ("eyes", "blue")])

# This would change all the 1s into 0s for all tall animals.
zero_out(df, [("height", "tall")])

My attempts thus far: I tried to do this with *unpacking but had no luck because I don't know how to set multiple conditions using unpacked variables. It's easy to set multiple conditions if I hardcode the number of conditions though... df[(condition1) & (condition2) & (condition3)] = 0

Further, and perhaps this is outside the scope of the question, how can i set a variable number of conditions given a regular if statement using *unpacking (or without hardcoding the number of conditions in the if statement?

E.g.

if a > 0 and b > 4
#Or...
if a > 0 and b > 4 and c < 2

Your help is appreciated.


Solution

  • If I understand you correctly, you are looking for .query() method:

    import pandas as pd
    from itertools import product
    
    animals = ["dogs", "cats"]
    eyes = ['brown', 'blue', 'green']
    height = ['short', 'average', 'tall']
    a = [animals, eyes, height]
    df = pd.DataFrame(list(product(*a)), columns=["animals", "eyes", "height"])
    df['value'] = 1
    
    
    def zero_out(df, lst):
        q = ' & '.join( '{} == "{}"'.format(col, val) for col, val in lst )
        df.loc[df.query(q).index, 'value'] = 0
    
    zero_out(df, [("height", "tall")])
    print(df)
    

    Prints:

       animals   eyes   height  value
    0     dogs  brown    short      1
    1     dogs  brown  average      1
    2     dogs  brown     tall      0
    3     dogs   blue    short      1
    4     dogs   blue  average      1
    5     dogs   blue     tall      0
    6     dogs  green    short      1
    7     dogs  green  average      1
    8     dogs  green     tall      0
    9     cats  brown    short      1
    10    cats  brown  average      1
    11    cats  brown     tall      0
    12    cats   blue    short      1
    13    cats   blue  average      1
    14    cats   blue     tall      0
    15    cats  green    short      1
    16    cats  green  average      1
    17    cats  green     tall      0
    

    Or zero_out(df, [("animals", "dogs"), ("eyes", "blue")]):

       animals   eyes   height  value
    0     dogs  brown    short      1
    1     dogs  brown  average      1
    2     dogs  brown     tall      1
    3     dogs   blue    short      0
    4     dogs   blue  average      0
    5     dogs   blue     tall      0
    6     dogs  green    short      1
    7     dogs  green  average      1
    8     dogs  green     tall      1
    9     cats  brown    short      1
    10    cats  brown  average      1
    11    cats  brown     tall      1
    12    cats   blue    short      1
    13    cats   blue  average      1
    14    cats   blue     tall      1
    15    cats  green    short      1
    16    cats  green  average      1
    17    cats  green     tall      1