import numpy as np
import pandas as pd
df = pd.DataFrame({
'cond': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
'Array': ['S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','S', 'S', 'TT', 'TT','SS','TT'],
'Area': [3.0, 2.0, 2.88, 1.33, 2.44, 1.25, 1.53, 1.0, 0.156, 2.0, 2.4, 6.3, 6.9, 9.78, 10.2, 3.0, 16.0, 19.0]
})
print(df)
I am trying to make an indicator column that indicates if the area being a certain size has already happened. So for example, if the cond. is A, then I want to indicate the first time that the area is <=1.5 (and for all data points after), and if the cond. is B, then to indicate the first time the area >10 (and for all points after). The final result should look like:
cond Array Area Indicator
0 A S 3.000 0
1 A S 2.000 0
2 A TT 2.880 0
3 A TT 1.330 1
4 A S 2.440 1
5 A S 1.250 1
6 A TT 1.530 1
7 A TT 1.000 1
8 A S 0.156 1
9 B S 2.000 0
10 B TT 2.400 0
11 B TT 6.300 0
12 B S 6.900 0
13 B S 9.780 0
14 B TT 10.200 1
15 B TT 3.000 1
16 B SS 16.000 1
17 B TT 19.000 1
A lot of the other examples I looked at were to indicate if the area for A was <=1.5, or indicate the first time it happens, but not to indicate the first time it happens AND indicate all the datapoints after. The idea is that once my condition hits a certain area, it enters a different "phase" and I'm trying to indicate when "A" enters and stays in that phase (and the equivalent for B).
You can make write the conditions, and then group by cond
and use cumsum
+ clip
:
mask = (df['cond'].eq('A') & df['Area'].lt(1.5)) | (df['cond'].eq('B') & df['Area'].gt(10))
df['Indicator'] = mask.groupby(df['cond']).cumsum().clip(0, 1)
Output:
>>> df
cond Array Area Indicator
0 A S 3.000 0
1 A S 2.000 0
2 A TT 2.880 0
3 A TT 1.330 1
4 A S 2.440 1
5 A S 1.250 1
6 A TT 1.530 1
7 A TT 1.000 1
8 A S 0.156 1
9 B S 2.000 0
10 B TT 2.400 0
11 B TT 6.300 0
12 B S 6.900 0
13 B S 9.780 0
14 B TT 10.200 1
15 B TT 3.000 1
16 B SS 16.000 1
17 B TT 19.000 1