Pandas pivot_table with columns as horizon

I have a dataframe like so:

first_dt     dt          val
2023-01-01   2023-01-02  1
2023-01-01   2023-01-02  1
2023-01-01   2023-01-03  1
...
2023-01-02   2023-01-03  1
2023-01-02   2023-01-04  1

You should note that dt is never less than first_dt. I would like to reshape to a pivot_table like the following

first_dt    horizon_1  horizon_2  horizon_3
2023-01-01  2          1          0
2023-01-02  1          1          0

Initially, I thought I would do the following

pandas.pivot_table(df, index='first_dt', columns='dt', values='val', aggfunc=np.sum)

However, this doesn't scale well when I have years of data when I only want a handful of horizons (e.g. 3, 7, etc).

Any thoughts about how to dynamically count dt over the first_dt time?

Solution

Convert your data to datetime, compute the day difference (horizon), then pivot:

df['first_dt'] = pd.to_datetime(df['first_dt'])
df['dt'] = pd.to_datetime(df['dt'])

(df.assign(horizon=lambda x: x['dt'].sub(x['first_dt']).div(pd.Timedelta('1D')).astype(int))
   .query('horizon < 7') # query
   .pivot_table(index='first_dt', columns='horizon', values='val', aggfunc='sum')
)

Or similarly with pd.crosstab:

pd.crosstab(
    df['first_dt'],
    pd.to_datetime(df['dt']).sub(pd.to_datetime(df['first_dt']))
      .div(pd.Timedelta('1D')).astype(int)
      .loc[lambda x: x<7],
    df['val'],
    aggfunc='sum'
)

Output:

horizon     1  2
first_dt        
2023-01-01  2  1
2023-01-02  1  1