I have a dataframe where the column names are dates and want to sum between the 4 weeks
dataframe
ID |20220101 |20220107 |20220114 |20220121 |20220128
3893 | 0 | 1.34 | 1.2 | 0.2 | 0
3126 | 0 | 1.34 | 1.2 | 0.2 | 0
3122 | 1 | 1.22 | 0.76 | 2.2 | 1.1
current_date = datetime(2023, 5, 12)
start_of_year = datetime(2023, 1, 1)
print(current_date, start_of_year)
current_4_wks_start = current_date
current_4_wks_end = current_date - timedelta(weeks=4)
print(current_4_wks_start, current_4_wks_end)
past_4_wks_start = current_4_wks_end
past_4_wks_end = past_4_wks_start - timedelta(weeks=4)
print(past_4_wks_start, past_4_wks_end)
I want to create a column with the sum for each row between the last 4 weeks. Is there a way I can parse through the column names as dates and if they fall between the past or current week dates then sum?
Convert it to a long format, then perform your sum grouped by ID.
df_long = (df.set_index('ID')
.stack()
.reset_index()
.rename({'level_1': 'date', '0': 'value'}, axis=1)
)
df_long['date'] = pd.to_datetime(df_long['date'], format='%Y%m%d')
sum_by_id = df_long[df_long['date'].between(past_4_wks_start, past_4_wks_end)].groupby('ID').sum('value')