Search code examples
pythonpandasdata-wranglingpandas-resample

Pandas Resample 2M and 3M for every month


I have this code to calculate Returns:

import yfinance as yf
import numpy as np
import pandas as pd

df = yf.download('SPY', '2023-01-01')
df = df[['Close']]
df['d_returns'] = np.log(df.div(df.shift(1)))
df.dropna(inplace = True)

df_1M = pd.DataFrame()
df_2M = pd.DataFrame()
df_3M = pd.DataFrame()

df_1M['1M cummreturns'] = df.d_returns.cumsum().apply(np.exp)
df_2M['2M cummreturns']= df.d_returns.cumsum().apply(np.exp)
df_3M['3M cummreturns'] = df.d_returns.cumsum().apply(np.exp)

df1 = df_1M[['1M cummreturns']].resample('1M').max()
df2 = df_2M[['2M cummreturns']].resample('2M').max()
df3 = df_3M[['3M cummreturns']].resample('3M').max()

df1 = pd.concat([df1, df2, df3], axis=1)
df1

This gives the following:

            1M cummreturns  2M cummreturns  3M cummreturns
Date            
2023-01-31  1.067381        1.067381        1.067381
2023-02-28  1.094428        NaN             NaN
2023-03-31  1.075022        1.094428        NaN
2023-04-30  1.092196        NaN             1.094428
2023-05-31  1.103356        1.103356        NaN
2023-06-30  1.164014        NaN             NaN
2023-07-31  1.202116        1.202116        1.202116
2023-08-31  1.198677        NaN             NaN
2023-09-30  1.184785        1.198677        NaN
2023-10-31  1.145738        NaN             1.198677
2023-11-30  1.198466        1.198466        NaN
2023-12-31  1.251746        NaN             NaN
2024-01-31  1.290032        1.290032        1.290032
2024-02-29  1.334174        NaN             NaN
2024-03-31  1.346699        1.346699        NaN
2024-04-30  NaN             NaN             1.346699

How to get valid values in 2M cummreturns and 3M cummreturns columns for every row?

For instance, 2023-02-28 row represents Feb-2023 month. The columns 2M cummreturns and 3M cummreturns need to have max returns in the next 2 Months and 3 Months time respectively starting from Feb-2023 the same way 1M cummreturns gives max returns in the next 1 Month time.


Solution

  • IIUC, you only want to resample for the first column, to get the months. Then compute a rolling.max:

    # compute once the max per month
    tmp = df.d_returns.cumsum().apply(np.exp).resample('1M').max()
    
    # then roll over the above with different windows
    N = 3 # number of columns to generate
    out = pd.DataFrame({f'{x+1}M cummreturns': tmp.rolling(x+1, min_periods=1).max()
                        for x in range(N)})
    

    Output:

                1M cummreturns  2M cummreturns  3M cummreturns
    Date                                                      
    2023-01-31        1.067381        1.067381        1.067381
    2023-02-28        1.094428        1.094428        1.094428
    2023-03-31        1.075022        1.094428        1.094428
    2023-04-30        1.092196        1.092196        1.094428
    2023-05-31        1.103356        1.103356        1.103356
    2023-06-30        1.164014        1.164014        1.164014
    2023-07-31        1.202116        1.202116        1.202116
    2023-08-31        1.198677        1.202116        1.202116
    2023-09-30        1.184785        1.198677        1.202116
    2023-10-31        1.145738        1.184785        1.198677
    2023-11-30        1.198466        1.198466        1.198466
    2023-12-31        1.251746        1.251746        1.251746
    2024-01-31        1.290032        1.290032        1.290032
    2024-02-29        1.334174        1.334174        1.334174
    2024-03-31        1.346699        1.346699        1.346699