I've been stuck on a Pandas problem and I can't seem to figure it out. I have a dataframe like this:
ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan
I want to add a new column to this dataframe with the following pseudocode
For each unique value in ref, if result = fail
, then new_column = no
for all subsequent rows of the same "ref" value.
This is how the new dataframe should look like.
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes
What I've managed to do is the following:
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes
This is achieved through the df.loc
function.
But I need the function to apply to unique values, rather than each row.
I think you can use transform
:
print (df)
ref value rule result new_column
0 a100 25 high pass NaN
1 a100 25 high fail NaN
2 a100 25 medium fail NaN
3 a100 25 medium pass NaN
4 a101 15 high fail NaN
5 a101 15 high pass NaN
6 a102 20 high pass NaN
df['new_column']=df.groupby('ref')['result']
.transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes
Thank you Jon Clements
for another solution with replace
:
df['new_column'] = df.groupby('ref')['result']
.transform(lambda L: (L == 'fail').any())
.replace({True: 'no', False: 'yes'})
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes