I have the following pandas Data Frame (without 2 the last columns):
name day show-in-appointment previous-missed-appointments proportion-previous-missed
0 Jack 2020/01/01 show 0 0
1 Jack 2020/01/02 no-show 0 0
2 Jill 2020/01/02 no-show 0 0
3 Jack 2020/01/03 show 1 0.5
4 Jill 2020/01/03 show 1 1
5 Jill 2020/01/04 no-show 1 0.5
6 Jack 2020/01/04 show 1 0.33
7 Jill 2020/01/05 show 2 0.66
8 jack 2020/01/06 no-show 1 0.25
9 jack 2020/01/07 show 2 0.4>>>2(noshow)/5(noshow+show)
df = pd.DataFrame(
data=np.asarray([
['Jack', 'Jack', 'Jill', 'Jack', 'Jill', 'Jill', 'Jack', 'Jill', 'jack', 'jack'],
[
'2020/01/01',
'2020/01/02',
'2020/01/02',
'2020/01/03',
'2020/01/03',
'2020/01/04',
'2020/01/04',
'2020/01/05',
'2020/01/06',
'2020/01/07',
],
['show', 'no-show', 'no-show', 'show', 'show', 'no-show', 'show', 'show', 'no-show', 'show'],
]).T,
columns=['name', 'day', 'show-in-appointment'],
)
previous-missed-appointments column is create like the code below:
df.name = df.name.str.capitalize()
df['order'] = df.index
df.day = pd.to_datetime(df.day)
df['noshow'] = df['show-in-appointment'].map({'show': 0, 'no-show': 1})
df = df.sort_values(by=['name', 'day'])
df['previous-missed-appointments'] = df.groupby('name').noshow.cumsum()
df.loc[df.noshow == 1, 'previous-missed-appointments'] -= 1
df = df.sort_values(by='order')
df = df.drop(columns=['noshow', 'order'])
HOW CAN I CREATE THE LAST COLUMN ????*********
You can use cumsum
and shift
in groupby.apply
for the first column, then divide by groupby.cumcount
for the second column:
# ensure dates are sorted
df = df.sort_values(by='day', key=lambda s: pd.to_datetime(s, dayfirst=False))
m = df['show-in-appointment'].eq('no-show')
g = m.groupby(df['name'].str.casefold(), group_keys=False)
df['previous-missed-appointments'] = (
g.apply(lambda x: x.cumsum().shift(fill_value=0))
)
df['proportion-previous-missed'] = (
df['previous-missed-appointments'].div(g.cumcount()).fillna(0)
)
print(df)
NB. it is important that the input is first sorted by date (or name/date).
Output:
name day show-in-appointment previous-missed-appointments proportion-previous-missed
0 Jack 2020/01/01 show 0 0.000000
1 Jack 2020/01/02 no-show 0 0.000000
2 Jill 2020/01/02 no-show 0 0.000000
3 Jack 2020/01/03 show 1 0.500000
4 Jill 2020/01/03 show 1 1.000000
5 Jill 2020/01/04 no-show 1 0.500000
6 Jack 2020/01/04 show 1 0.333333
7 Jill 2020/01/05 show 2 0.666667
8 jack 2020/01/06 no-show 1 0.250000
9 jack 2020/01/07 show 2 0.400000
Intermediates (with names grouped together and shorter column names for clarity):
name day show previous-missed computation proportion
0 Jack 2020/01/01 show 0 0/0 0.00
1 Jack 2020/01/02 no-show 0 0/1 0.00
3 Jack 2020/01/03 show 1 1/2 0.50
6 Jack 2020/01/04 show 1 1/3 0.33
8 Jack 2020/01/06 no-show 1 1/4 0.25
9 Jack 2020/01/07 show 2 2/5 0.40
2 Jill 2020/01/02 no-show 0 0/0 0.00
4 Jill 2020/01/03 show 1 1/1 1.00
5 Jill 2020/01/04 no-show 1 1/2 0.50
7 Jill 2020/01/05 show 2 2/3 0.67