Below is a toy Pandas dataframe that has three columns: 'id' (group id), 'b' (for condition), and 'c' (target):
df = pd.DataFrame({'id' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [3,4,5,'A',3,4,'A',1,'A',1,3,'A',2,3],
'c' : [1,0,1,10,1,1,20,1,10,0,1,20,1,1]})
print(df)
id b c
0 1 3 1
1 1 4 0
2 1 5 1
3 1 A 10
4 1 3 1
5 1 4 1
6 1 A 20
7 2 1 1
8 2 A 10
9 2 1 0
10 2 3 1
11 2 A 20
12 2 2 1
13 2 3 1
For each group, I want to replace the values in column 'c' with nan (i.e., np.nan
) before the first occurrence of 'A' in column 'b'.
The desired output is the following:
desired_output_df = pd.DataFrame({'id' : [1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'b' : [3,4,5,'A',3,4,'A',1,'A',1,3,'A',2,3],
'c' : [np.nan,np.nan,np.nan,10,1,1,20,np.nan,10,0,1,20,1,1]})
print(desired_output_df)
id b c
0 1 3 NaN
1 1 4 NaN
2 1 5 NaN
3 1 A 10.0
4 1 3 1.0
5 1 4 1.0
6 1 A 20.0
7 2 1 NaN
8 2 A 10.0
9 2 1 0.0
10 2 3 1.0
11 2 A 20.0
12 2 2 1.0
13 2 3 1.0
I am able to get the index of the values of column c that I want to change using the following command: df.groupby('id').apply(lambda x: x.loc[:(x.b == 'A').idxmax()-1]).index
. But the result is a "MultiIndex" and I can't seem to use it to replace the values.
MultiIndex([(1, 0),
(1, 1),
(1, 2),
(2, 7)],
names=['id', None])
Thanks in advance.
Try:
df['c'] = np.where(df.groupby('id').apply(lambda x: x['b'].eq('A').cumsum()) > 0, df['c'], np.nan)
print(df)
Prints:
id b c
0 1 3 NaN
1 1 4 NaN
2 1 5 NaN
3 1 A 10.0
4 1 3 1.0
5 1 4 1.0
6 1 A 20.0
7 2 1 NaN
8 2 A 10.0
9 2 1 0.0
10 2 3 1.0
11 2 A 20.0
12 2 2 1.0
13 2 3 1.0