Search code examples
python-3.xpandasdataframedatetimepython-datetime

Pandas find hourly rolling average


My dataset df looks like this. It is a minute based dataset.

time, Open, High
2017-01-01 00:00:00, 1.2432, 1.1234
2017-01-01 00:01:00, 1.2432, 1.1234
2017-01-01 00:02:00, 1.2332, 1.1234
2017-01-01 00:03:00, 1.2132, 1.1234
...., ...., ....
2017-12-31 23:59:00, 1.2132, 1.1234

I want to find the hourly rolling mean for Open column but it should be flexible so that I can also find hourly rolling mean for other columns.

What did I do?

I am able to find the daily rolling average like given below:

# Pandas code to find the rolling mean for a single day

df
.assign(1davg=df.rolling(window=1*24*60)['Open'].mean()) 
.groupby(df['time'].dt.date) 
.last() 

Please note that changing this(window=1*24*60 to window=60) line of code does not work because I already tried it.

The new output should look like this:

time,                 Open,  High,   Open_hour_avg
2017-01-01 00:00:00, 1.2432, 1.1234,   1.2532
2017-01-01 01:00:00, 1.2432, 1.1234,   1.2632    
2017-01-01 02:00:00, 1.2332, 1.1234,   1.2332
2017-01-01 03:00:00, 1.2132, 1.1234,   1.2432
...., ...., ...., ....
2017-12-31 23:00:00, 1.2132, 1.1234,   1.2232

here,

2017-01-01 00:00:00, 1.2432, 1.1234, 1.2532 is the minute average data for midnight

and 2017-01-01 01:00:00, 1.2432, 1.1234, 1.2632 is the minute average data for 1 AM


Solution

  • This is how I got it working:

    import pandas as pd
    
    # After your CSV data is in a df
    
    df['time'] = pd.to_datetime(df['time'])
    df.index = df['time']
    df_mean = df.resample('H').mean()
    
    
    time,                 Open       High   
    2017-01-01 00:00:00 1.051488    1.051500     
    2017-01-01 01:00:00 1.051247    1.051275     
    2017-01-01 02:00:00 1.051890    1.051957     
    2017-01-01 03:00:00 1.051225    1.051290     
    ...., ...., ....
    2017-12-31 23:00:00 1.051225    1.051290