Search code examples
pythonpython-3.xpandaspandas-groupbyresampling

Pandas Resampling with delta time after specific starting time


After reading a CSV into a data frame, I am trying to resample my "Value" column to 5 seconds, starting from the first rounded second of the time value. I would like to have the mean for all the values within the next 5 seconds, starting from 46:19.6 (format %M:%S:%f). So the code would give me the mean for 46:20, then 46:25, and so on...Does anybody know how to do this? Thank you!

input:

df = pd.DataFrame({'Time': {0: '46:19.6',
  1: '46:20.7',
  2: '46:21.8',
  3: '46:22.9',
  4: '46:24.0',
  5: '46:25.1',
  6: '46:26.2',
  7: '46:27.6',
  8: '46:28.7',
  9: '46:29.8',
  10: '46:30.9',
  11: '46:32.0',
  12: '46:33.2',
  13: '46:34.3',
  14: '46:35.3',
  15: '46:36.5',
  16: '46:38.8',
  17: '46:40.0'},
 'Value': {0: 0,
  1: 1,
  2: 2,
  3: 3,
  4: 4,
  5: 5,
  6: 6,
  7: 8,
  8: 9,
  9: 10,
  10: 11,
  11: 12,
  12: 13,
  13: 14,
  14: 15,
  15: 17,
  16: 19,
  17: 20}})

Solution

  • Assuming your Time field is in datetime64[ns] format, you simply can use pd.Grouper and pass freq=5S:

    # next line of code is optional to transform to datetime format if the `Time` field is an `object` i.e. string.
    # df['Time'] = pd.to_datetime('00:'+df['Time'])
    
    df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index()
    
    #Depending on what you want to do, you can also replace the above line of code with one of two below:
    #df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index().iloc[1:]
    #df1 = df.groupby(pd.Grouper(key='Time', freq='5S', base=4.6))['Value'].mean().reset_index()
    #In the above line of code 4.6s can be adjusted to whatever number between 0 and 5.
    df1
    

    output:

        Time                 Value
    0   2020-07-07 00:46:15  0.0
    1   2020-07-07 00:46:20  2.5
    2   2020-07-07 00:46:25  7.6
    3   2020-07-07 00:46:30  12.5
    4   2020-07-07 00:46:35  17.0
    5   2020-07-07 00:46:40  20.0
    

    Full reproducible code from an example DataFrame I created:

    import re
    import pandas
    df = pd.DataFrame({'Time': {0: '46:19.6',
      1: '46:20.7',
      2: '46:21.8',
      3: '46:22.9',
      4: '46:24.0',
      5: '46:25.1',
      6: '46:26.2',
      7: '46:27.6',
      8: '46:28.7',
      9: '46:29.8',
      10: '46:30.9',
      11: '46:32.0',
      12: '46:33.2',
      13: '46:34.3',
      14: '46:35.3',
      15: '46:36.5',
      16: '46:38.8',
      17: '46:40.0'},
     'Value': {0: 0,
      1: 1,
      2: 2,
      3: 3,
      4: 4,
      5: 5,
      6: 6,
      7: 8,
      8: 9,
      9: 10,
      10: 11,
      11: 12,
      12: 13,
      13: 14,
      14: 15,
      15: 17,
      16: 19,
      17: 20}})
    df['Time'] = pd.to_datetime('00:'+df['Time'])
    df1 = df.groupby(pd.Grouper(key='Time', freq='5S'))['Value'].mean().reset_index()
    df1