I have a dataframe where some of the values in the 'Type' column are incorrect:
ID | Type | Other Data |
---|---|---|
ABC | 10.0 | 56 |
ABC | 10.0 | 34 |
ABC | 0.0 | 78 |
DEF | 20.0 | 45 |
DEF | 0.0 | 23 |
ZWS | 20.0 | 99 |
TYH | 10.0 | 12 |
TYH | 10.0 | 63 |
FCE | 0.0 | 56 |
FCE | 20.0 | 78 |
XSA | 10.0 | 74 |
RED | 0.0 | 55 |
QWA | 20.0 | 74 |
I would like to go through each ID and identify if the Type is equal to 0.0, then replace it with a value from the a row where the same id has a value other than 0.0. If there is no other row with the ID, and it is 0.0 in the type column then change it to Nan, or 99 so it can be removed later.
The outcome would be: ABC on 3rd row changes to 10.0 DEF on 5th row changes to 20.0 FCE on 9th row changes to 20.0 RED on 12th row changes to Nan or 99 as no equivalent match found.
What would be the best approach for this problem?
I have tried building dictionary and looping, but I keep getting stuck at the replace 0.0 for values of equivalent key.
You can use a boolean mask to hide rows equal to 0 then group by Type
and broadcast the max value (any other value than NaN
) to all rows:
m = df['Type'].eq(0) # boolean mask
df['New Type'] = df['Type'].mask(m).groupby(df['ID']).transform('max')
print(df)
# Output
ID Type Other Data New Type
0 ABC 10.0 56 10.0
1 ABC 10.0 34 10.0
2 ABC 0.0 78 10.0 # 0.0 -> 10.0
3 DEF 20.0 45 20.0
4 DEF 0.0 23 20.0 # 0.0 -> 20.0
5 ZWS 20.0 99 20.0
6 TYH 10.0 12 10.0
7 TYH 10.0 63 10.0
8 FCE 0.0 56 20.0 # 0.0 -> 20.0
9 FCE 20.0 78 20.0
10 XSA 10.0 74 10.0
11 RED 0.0 55 NaN # 0.0 -> NaN
12 QWA 20.0 74 20.0