Search code examples
pandasdataframegroup-by

Get the sum of columns between dates


I have a dataframe where the column names are dates and want to sum between the 4 weeks

dataframe

 ID   |20220101  |20220107 |20220114 |20220121 |20220128
 3893 |  0       | 1.34    | 1.2     | 0.2     | 0
 3126 |  0       | 1.34    | 1.2     | 0.2     | 0
 3122 |  1       | 1.22    | 0.76    | 2.2     | 1.1
current_date = datetime(2023, 5, 12)
start_of_year = datetime(2023, 1, 1)
print(current_date, start_of_year)

current_4_wks_start = current_date
current_4_wks_end   = current_date - timedelta(weeks=4)
print(current_4_wks_start, current_4_wks_end)

past_4_wks_start = current_4_wks_end 
past_4_wks_end = past_4_wks_start - timedelta(weeks=4)
print(past_4_wks_start, past_4_wks_end) 

I want to create a column with the sum for each row between the last 4 weeks. Is there a way I can parse through the column names as dates and if they fall between the past or current week dates then sum?


Solution

  • Convert it to a long format, then perform your sum grouped by ID.

    df_long = (df.set_index('ID')
                 .stack()
                 .reset_index()
                 .rename({'level_1': 'date', '0': 'value'}, axis=1)
              )
    
    df_long['date'] = pd.to_datetime(df_long['date'], format='%Y%m%d')
    
    sum_by_id = df_long[df_long['date'].between(past_4_wks_start, past_4_wks_end)].groupby('ID').sum('value')