Search code examples
pythonpandasdataframeuniqueconditional-statements

Pandas apply function to unique values in column


I've been stuck on a Pandas problem and I can't seem to figure it out. I have a dataframe like this:

ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan

I want to add a new column to this dataframe with the following pseudocode

For each unique value in ref, if result = fail, then new_column = no for all subsequent rows of the same "ref" value.

This is how the new dataframe should look like.

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes

What I've managed to do is the following:

ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes

This is achieved through the df.loc function. But I need the function to apply to unique values, rather than each row.


Solution

  • I think you can use transform:

    print (df)
        ref  value    rule result  new_column
    0  a100     25    high   pass         NaN
    1  a100     25    high   fail         NaN
    2  a100     25  medium   fail         NaN
    3  a100     25  medium   pass         NaN
    4  a101     15    high   fail         NaN
    5  a101     15    high   pass         NaN
    6  a102     20    high   pass         NaN
    
    df['new_column']=df.groupby('ref')['result']
                       .transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
    print (df)
        ref  value    rule result new_column
    0  a100     25    high   pass         no
    1  a100     25    high   fail         no
    2  a100     25  medium   fail         no
    3  a100     25  medium   pass         no
    4  a101     15    high   fail         no
    5  a101     15    high   pass         no
    6  a102     20    high   pass        yes
    

    Thank you Jon Clements for another solution with replace:

    df['new_column'] = df.groupby('ref')['result']
                         .transform(lambda L: (L == 'fail').any())
                         .replace({True: 'no', False: 'yes'})
    
    print (df)
        ref  value    rule result new_column
    0  a100     25    high   pass         no
    1  a100     25    high   fail         no
    2  a100     25  medium   fail         no
    3  a100     25  medium   pass         no
    4  a101     15    high   fail         no
    5  a101     15    high   pass         no
    6  a102     20    high   pass        yes