I have a dataframe like so:
first_dt dt val
2023-01-01 2023-01-02 1
2023-01-01 2023-01-02 1
2023-01-01 2023-01-03 1
...
2023-01-02 2023-01-03 1
2023-01-02 2023-01-04 1
You should note that dt
is never less than first_dt
. I would like to reshape to a pivot_table
like the following
first_dt horizon_1 horizon_2 horizon_3
2023-01-01 2 1 0
2023-01-02 1 1 0
Initially, I thought I would do the following
pandas.pivot_table(df, index='first_dt', columns='dt', values='val', aggfunc=np.sum)
However, this doesn't scale well when I have years of data when I only want a handful of horizons (e.g. 3, 7, etc).
Any thoughts about how to dynamically count dt
over the first_dt
time?
Convert your data to datetime, compute the day difference (horizon), then pivot:
df['first_dt'] = pd.to_datetime(df['first_dt'])
df['dt'] = pd.to_datetime(df['dt'])
(df.assign(horizon=lambda x: x['dt'].sub(x['first_dt']).div(pd.Timedelta('1D')).astype(int))
.query('horizon < 7') # query
.pivot_table(index='first_dt', columns='horizon', values='val', aggfunc='sum')
)
Or similarly with pd.crosstab
:
pd.crosstab(
df['first_dt'],
pd.to_datetime(df['dt']).sub(pd.to_datetime(df['first_dt']))
.div(pd.Timedelta('1D')).astype(int)
.loc[lambda x: x<7],
df['val'],
aggfunc='sum'
)
Output:
horizon 1 2
first_dt
2023-01-01 2 1
2023-01-02 1 1