I hope someone can help me out with this! I haven't found anything online that comes close enough.
Sample data:
import pandas as pd
sample_data = {
'id': [1,1,1,1,1,2,2,2,2,2],
'date_rank': [1,2,3,4,5,1,2,3,4,5],
'candidates': [1,0,0,3,0,0,0,0,2,0],
'desired_output':['New_filled','New_open','Double_open','Double_filled','New_open','New_open','Double_open','Double_open','Double_filled','New_open']
}
df = pd.DataFrame(sample_data, columns=['id', 'date_rank','candidates', 'desired_output'])
df
In the sample_data output below the "desired_output" column shows the desired result:
id date_rank candidates desired_output
0 1 1 1 New_filled
1 1 2 0 New_open
2 1 3 0 Double_open
3 1 4 3 Double_filled
4 1 5 0 New_open
5 2 1 0 New_open
6 2 2 0 Double_open
7 2 3 0 Double_open
8 2 4 2 Double_filled
9 2 5 0 New_open
The date_rank column isn't that important except for the first entry.
The first entry will always be "new" but could be either "filled" or "open". It's open when 0 candidates were hired and closed if one or more candidates were hired. This applies to the rest of the entries as well.
If an entry is filled, the next row will always be new.
If an entry is open because there were no candidates, the next entry will always be double.
If you look at the fourth row you'll see that an entry can be double and filled as long as the previous row was open.
There are four possible values/conditions in the desired_ouptut column. I can make this work with less conditions but not with four, especially when the value depends on the previous row value.
You can use two simple conditionals with numpy.where
on the current row, and the previous one (with groupby.shift
):
m = df['candidates'].eq(0)
df['output'] = pd.Series(np.where(m.groupby(df['id']).shift(fill_value=False),
'Double_', 'New_'), index=df.index
).add(np.where(m, 'open', 'filled'))
In numpy:
m = df['candidates'].eq(0)
a1 = np.where(m.groupby(df['id']).shift(fill_value=False), 'Double_', 'New_')
a2 = np.where(m, 'open', 'filled')
df['output'] = np.core.defchararray.add(a1, a2)
Output:
id date_rank candidates desired_output output
0 1 1 1 New_filled New_filled
1 1 2 0 New_open New_open
2 1 3 0 Double_open Double_open
3 1 4 3 Double_filled Double_filled
4 1 5 0 New_open New_open
5 2 1 0 New_open New_open
6 2 2 0 Double_open Double_open
7 2 3 0 Double_open Double_open
8 2 4 2 Double_filled Double_filled
9 2 5 0 New_open New_open