Search code examples
pythonpandasdataframegroup-bybinning

Find the average for user-defined window in pandas


I have a pandas dataframe that has raw heart rate data with the index of time (in seconds).

I am trying to bin the data so that I can have the average of a user define window (e.g. 10s) - not a rolling average, just an average of 10s, then the 10s following, etc.

import pandas as pd

hr_raw = pd.read_csv('hr_data.csv', index_col='time')
print(hr_raw)

      heart_rate
time            
0.6        164.0
1.0        182.0
1.3        164.0
1.6        150.0
2.0        152.0
2.4        141.0
2.9        163.0
3.2        141.0
3.7        124.0
4.2        116.0
4.7        126.0
5.1        116.0
5.7        107.0

Using the example data above, I would like to be able to set a user defined window size (let's use 2 seconds) and produce a new dataframe that has index of 2sec increments and averages the 'heart_rate' values if the time falls into that window (and should continue to the end of the dataframe).

For example:

      heart_rate
time            
2.0        162.40
4.0        142.25
6.0        116.25

I can only seem to find methods to bin the data based on a predetermined number of bins (e.g. making a histogram) and this only returns the count/frequency.

thanks.


Solution

  • A groupby should do it.

    df.groupby((df.index // 2 + 1) * 2).mean()
    
          heart_rate
    time            
    2.0       165.00
    4.0       144.20
    6.0       116.25
    

    Note that the reason for the slight difference between our answers is that the upper bound is excluded. That means, a reading taken at 2.0s will be considered for the 4.0s time interval. This is how it is usually done, a similar solution with the TimeGrouper will yield the same result.