Search code examples
pythonpandasdataframedatetimetrading

How do I change the name of a resampled column?


I have a dataframe with the price fluctuations of the Nasdaq stock index every minute. In trading it is important to take into account data on different time units (to know the short term, medium and long term trends...)

So I used the resample() method of Pandas to get a dataframe with the price in 5 minutes in addition to the original 1 minute:

df1m = pd.DataFrame({
'Time' : ['2022-01-11 09:30:00', '2022-01-11 09:31:00', '2022-01-11 09:32:00', '2022-01-11 09:33:00', '2022-01-11 09:34:00', '2022-01-11 09:35:00', '2022-01-11 09:36:00' , '2022-01-11 09:37:00' , '2022-01-11 09:38:00' ,
       '2022-01-11 09:39:00', '2022-01-11 09:40:00'],  
'Price' : [1,2,3,4,5,6,7,8,9,10,11]})
df1m['Time'] = pd.to_datetime(df1m['Time'])
df1m.set_index(['Time'], inplace =True)

df5m = df1m.resample('5min').first() 

I renamed the column names to 5min :

df5m.rename(columns={'Price' : 'Price5'})

Unfortunately the change of column names is no longer taken into account when the two dataframes (1 and 5 min) are put together:

df_1m_5m = pd.concat([df1m, df5m], axis=1)

How to rename definitively the columns created for the 5min data and avoid having twice the same column name for different data?


Solution

  • You can use:

    df5m = df1m.resample('5min').first().add_suffix('5')
    df_1m_5m = pd.concat([df1m, df5m], axis=1)
    

    Output:

    >>> df_1m_5m
                         Price  Price5
    Time                              
    2022-01-11 09:30:00      1     1.0
    2022-01-11 09:31:00      2     NaN
    2022-01-11 09:32:00      3     NaN
    2022-01-11 09:33:00      4     NaN
    2022-01-11 09:34:00      5     NaN
    2022-01-11 09:35:00      6     6.0
    2022-01-11 09:36:00      7     NaN
    2022-01-11 09:37:00      8     NaN
    2022-01-11 09:38:00      9     NaN
    2022-01-11 09:39:00     10     NaN
    2022-01-11 09:40:00     11    11.0
    

    You forgot to reassign the result to your dataframe:

    df5m = df5m.rename(columns={'Price' : 'Price5'})
    
    # OR
    
    df5m.rename(columns={'Price' : 'Price5'}, inplace=True)
    

    Output:

    >>> df5m
                         Price5
    Time                       
    2022-01-11 09:30:00       1
    2022-01-11 09:35:00       6
    2022-01-11 09:40:00      11