Search code examples
pythonpandaspivot-table

Pandas pivot_table with columns as horizon


I have a dataframe like so:

first_dt     dt          val
2023-01-01   2023-01-02  1
2023-01-01   2023-01-02  1
2023-01-01   2023-01-03  1
...
2023-01-02   2023-01-03  1
2023-01-02   2023-01-04  1

You should note that dt is never less than first_dt. I would like to reshape to a pivot_table like the following

first_dt    horizon_1  horizon_2  horizon_3
2023-01-01  2          1          0
2023-01-02  1          1          0

Initially, I thought I would do the following

pandas.pivot_table(df, index='first_dt', columns='dt', values='val', aggfunc=np.sum)

However, this doesn't scale well when I have years of data when I only want a handful of horizons (e.g. 3, 7, etc).

Any thoughts about how to dynamically count dt over the first_dt time?


Solution

  • Convert your data to datetime, compute the day difference (horizon), then pivot:

    df['first_dt'] = pd.to_datetime(df['first_dt'])
    df['dt'] = pd.to_datetime(df['dt'])
    
    (df.assign(horizon=lambda x: x['dt'].sub(x['first_dt']).div(pd.Timedelta('1D')).astype(int))
       .query('horizon < 7') # query
       .pivot_table(index='first_dt', columns='horizon', values='val', aggfunc='sum')
    )
    

    Or similarly with pd.crosstab:

    pd.crosstab(
        df['first_dt'],
        pd.to_datetime(df['dt']).sub(pd.to_datetime(df['first_dt']))
          .div(pd.Timedelta('1D')).astype(int)
          .loc[lambda x: x<7],
        df['val'],
        aggfunc='sum'
    )
    

    Output:

    horizon     1  2
    first_dt        
    2023-01-01  2  1
    2023-01-02  1  1