Search code examples
pythonpandasmoderolling-computation

Python - Calculating a rolling mode on a dataframe


I have a dataset that is reporting values for a specific date, that can then be updated on subsequent dates, thus creating 2 columns, Date and Reported_Date, for each Reported_Value. There is a separate ID field that is my dataframe's index. I want to calculate the mode and max for the last 5 reported dates. I know I can use dataset['Reported_Value'].rolling(5).max() to calculate the max, but trying rolling with mode leads to an error, 'Rolling' object has no attribute 'mode'. Does anyone know how this could be achieved? Is there also a way to have it only compute across one Date? So that the first few values of 2021-12-02 weren't using 2021-12-01 values?

Example DataFrame:

    ID    Date          Reported_Date    Reported_Value    Max_Last_5_Reported_Days
     1    2021-12-01    2021-12-10                   5                 NaN
     2    2021-12-01    2021-12-11                   6                 NaN
     3    2021-12-01    2021-12-12                   5                 NaN
     4    2021-12-01    2021-12-13                   3                 NaN
     5    2021-12-01    2021-12-14                   2                 6
     6    2021-12-01    2021-12-15                   11                11
     7    2021-12-01    2021-12-16                   7                 11
     8    2021-12-01    2021-12-17                   5                 11
     9    2021-12-01    2021-12-18                   6                 11
     10   2021-12-01    2021-12-19                   7                 11
     11   2021-12-02    2021-12-10                   2                 7
     12   2021-12-02    2021-12-11                   3                 7
     13   2021-12-02    2021-12-12                   2                 7
     14   2021-12-02    2021-12-13                   4                 7
     15   2021-12-02    2021-12-14                   4                 4
     16   2021-12-02    2021-12-15                   4                 4
     17   2021-12-02    2021-12-16                   3                 4
     18   2021-12-02    2021-12-17                   4                 4
     19   2021-12-02    2021-12-18                   2                 4
     20   2021-12-02    2021-12-19                   4                 4

Desired DataFrame:

    ID    Date          Reported_Date    Reported_Value    Max_Last_5_Report_Days   Mode_L5RD
     1    2021-12-01    2021-12-10                   5                 NaN             NaN
     2    2021-12-01    2021-12-11                   6                 NaN             NaN
     3    2021-12-01    2021-12-12                   5                 NaN             NaN
     4    2021-12-01    2021-12-13                   3                 NaN             NaN
     5    2021-12-01    2021-12-14                   2                 6               5
     6    2021-12-01    2021-12-15                   11                11              NaN
     7    2021-12-01    2021-12-16                   6                 11              NaN
     8    2021-12-01    2021-12-17                   5                 11              NaN
     9    2021-12-01    2021-12-18                   6                 11              6
     10   2021-12-01    2021-12-19                   6                 11              6
     11   2021-12-02    2021-12-10                   2                 NaN             NaN
     12   2021-12-02    2021-12-11                   3                 NaN             NaN
     13   2021-12-02    2021-12-12                   2                 NaN             NaN
     14   2021-12-02    2021-12-13                   4                 NaN             NaN
     15   2021-12-02    2021-12-14                   4                 4               4
     16   2021-12-02    2021-12-15                   4                 4               4
     17   2021-12-02    2021-12-16                   3                 4               4
     18   2021-12-02    2021-12-17                   4                 4               4
     19   2021-12-02    2021-12-18                   2                 4               4
     20   2021-12-02    2021-12-19                   4                 4               4

I'm not sure how I would convey that there are multiple mode values, so they're listed as NaN in the example.


Solution

  • groupby "Date" and use rolling_max for max for last 5 days; apply scipy.stats.mode for mode:

    from scipy.stats import mode
    rolling_obj = df.groupby('Date')['Reported_Value'].rolling(5)
    df['Max_Last_5_Report_Days'] = rolling_obj.max().droplevel(0)
    df['Mode_L5RD'] = rolling_obj.apply(lambda x: mode(x)[0]).droplevel(0)
    

    Output:

        ID        Date Reported_Date  Reported_Value  Max_Last_5_Reported_Days  \
    0    1  2021-12-01    2021-12-10               5                       NaN   
    1    2  2021-12-01    2021-12-11               6                       NaN   
    2    3  2021-12-01    2021-12-12               5                       NaN   
    3    4  2021-12-01    2021-12-13               3                       NaN   
    4    5  2021-12-01    2021-12-14               2                       6.0   
    5    6  2021-12-01    2021-12-15              11                      11.0   
    6    7  2021-12-01    2021-12-16               7                      11.0   
    7    8  2021-12-01    2021-12-17               5                      11.0   
    8    9  2021-12-01    2021-12-18               6                      11.0   
    9   10  2021-12-01    2021-12-19               7                      11.0   
    10  11  2021-12-02    2021-12-10               2                       7.0   
    11  12  2021-12-02    2021-12-11               3                       7.0   
    12  13  2021-12-02    2021-12-12               2                       7.0   
    13  14  2021-12-02    2021-12-13               4                       7.0   
    14  15  2021-12-02    2021-12-14               4                       4.0   
    15  16  2021-12-02    2021-12-15               4                       4.0   
    16  17  2021-12-02    2021-12-16               3                       4.0   
    17  18  2021-12-02    2021-12-17               4                       4.0   
    18  19  2021-12-02    2021-12-18               2                       4.0   
    19  20  2021-12-02    2021-12-19               4                       4.0   
    
        Max_Last_5_Report_Days  Mode_L5RD  
    0                      NaN        NaN  
    1                      NaN        NaN  
    2                      NaN        NaN  
    3                      NaN        NaN  
    4                      6.0        5.0  
    5                     11.0        2.0  
    6                     11.0        2.0  
    7                     11.0        2.0  
    8                     11.0        2.0  
    9                     11.0        7.0  
    10                     NaN        NaN  
    11                     NaN        NaN  
    12                     NaN        NaN  
    13                     NaN        NaN  
    14                     4.0        2.0  
    15                     4.0        4.0  
    16                     4.0        4.0  
    17                     4.0        4.0  
    18                     4.0        4.0  
    19                     4.0        4.0