I would like to check how many consecutive absences each student has had. I have a table called class_absenses which has the following fields:
id, student_id, present (0 for absent, 1 for present), date
I would like to return a list of students_id and how many consecutive absences they have had from today to some historical day in the past (say t-30).
If this can be done in sql then great! otherwise I am loading this into a pandas dataframe and am open to pivoting / grouping to get this done.
Thank you!
Sample Input
id student_id present date
0 1 0 4-28-2023
1 1 0 4-27-2023
2 1 1 4-26-2023
3 2 0 4-28-2023
4 2 1 4-27-2023
5 2 0 4-26-2023
6 3 1 4-28-2023
7 3 0 4-27-2023
8 3 0 4-26-2023
Output
student_id ConsecutiveAbsense
1 2
2 1
3 0
Assuming the dates are already sorted, use a custom groupby.agg
:
out = (
df.groupby('student_id')['present']
.agg(lambda s: s.eq(0).cummin().sum())
)
Output:
student_id
1 2
2 1
3 0
Name: present, dtype: int64
As DataFrame:
out = (
df.groupby('student_id', as_index=False)
.agg(ConsecutiveAbsense=('present', lambda s: s.eq(0).cummin().sum()))
)
Output:
student_id ConsecutiveAbsense
0 1 2
1 2 1
2 3 0