Search code examples
pythonsqlpandasgaps-and-islands

Count consecutive occurrences in sql and/or pandas


I would like to check how many consecutive absences each student has had. I have a table called class_absenses which has the following fields:

id, student_id, present (0 for absent, 1 for present), date

I would like to return a list of students_id and how many consecutive absences they have had from today to some historical day in the past (say t-30).

If this can be done in sql then great! otherwise I am loading this into a pandas dataframe and am open to pivoting / grouping to get this done.

Thank you!

Sample Input

id      student_id      present      date
0       1               0            4-28-2023
1       1               0            4-27-2023
2       1               1            4-26-2023
3       2               0            4-28-2023
4       2               1            4-27-2023
5       2               0            4-26-2023
6       3               1            4-28-2023
7       3               0            4-27-2023
8       3               0            4-26-2023

Output

student_id     ConsecutiveAbsense
1              2
2              1
3              0

Solution

  • Assuming the dates are already sorted, use a custom groupby.agg:

    out = (
     df.groupby('student_id')['present']
       .agg(lambda s: s.eq(0).cummin().sum())
    )
    

    Output:

    student_id
    1    2
    2    1
    3    0
    Name: present, dtype: int64
    

    As DataFrame:

    out = (
     df.groupby('student_id', as_index=False)
       .agg(ConsecutiveAbsense=('present', lambda s: s.eq(0).cummin().sum()))
    )
    

    Output:

       student_id  ConsecutiveAbsense
    0           1                   2
    1           2                   1
    2           3                   0