Search code examples
pythonpandasgroup-bypandas-resample

Alternating averaging times every row


This is probably a silly question that was already answered, but I'm struggling to find the answer I need. I'm looking to average a large dataset with 1-second times, switching between averaging every 10 minutes then every 20 minutes, then back to 10 minutes consecutively. I can use the pandas 'resample' function to get the full dataset as solely 10-minute or 20-minute averages, but I don't know how to approach alternating between the two.

Here's a sample of my data:

df= pd.DataFrame({'time': time, 'velocity': vel})

                     time           velocity
0     2023-08-29 00:00:10           0.133775
1     2023-08-29 00:00:11           0.152900
2     2023-08-29 00:00:12           0.114675
3     2023-08-29 00:00:13          -0.019125
4     2023-08-29 00:00:14          -0.076450
                  ...                ...
75537 2023-08-29 23:59:55          -0.057325
75538 2023-08-29 23:59:56          -0.095550
75539 2023-08-29 23:59:57          -0.038225
75540 2023-08-29 23:59:58          -0.038225
75541 2023-08-29 23:59:59          -0.057325

I would like to get this, averaging by 10 minutes, the 20, then 10, then 20 and so on, preferably using df.resample:

                     time           velocity
0     2023-08-29 00:00:00           0.003069
1     2023-08-29 00:00:10           0.000212
2     2023-08-29 00:00:30          -0.013170
3     2023-08-29 00:00:40          -0.015599
4     2023-08-29 00:01:00           0.010070

Any and all help is appreciated, and if this is a duplicate question, please direct me to that question because I'm not finding the examples I need. Thank you!


Solution

  • Example

    we need reproducible example to solve your problem. I will make a slightly different example for reproduciblilty, try to understand the logic based on it.

    import pandas as pd
    import numpy as np
    time = pd.date_range('2023-01-01', periods=30, freq='15s')
    df = pd.DataFrame({'time':time, 'velocity':np.random.randint(0, 10, 30)})
    

    df.head(5)

        time                velocity
    0   2023-01-01 00:00:00 7
    1   2023-01-01 00:00:15 9
    2   2023-01-01 00:00:30 8
    3   2023-01-01 00:00:45 9
    4   2023-01-01 00:01:00 0
    

    Code

    I will try to calculate the average for 1min and 2min. First, let's resample it to 1min for that.

    tmp = df.resample('T', on='time')['velocity'].mean().reset_index()
    

    tmp

        time                velocity
    0   2023-01-01 00:00:00 8.25
    1   2023-01-01 00:01:00 3.00
    2   2023-01-01 00:02:00 1.75
    3   2023-01-01 00:03:00 4.00
    4   2023-01-01 00:04:00 5.25
    5   2023-01-01 00:05:00 3.75
    6   2023-01-01 00:06:00 6.00
    7   2023-01-01 00:07:00 5.00
    

    make tmp -> 1min & 2min

    grp = (tmp.index % 3).isin([0, 1]).cumsum()
    out = tmp.groupby(grp).agg({'time':'first', 'velocity':'mean'})
    

    out:

        time                velocity
    1   2023-01-01 00:00:00 8.250
    2   2023-01-01 00:01:00 2.375
    3   2023-01-01 00:03:00 4.000
    4   2023-01-01 00:04:00 4.500
    5   2023-01-01 00:06:00 6.000
    6   2023-01-01 00:07:00 5.000
    

    Rewrite this code for 10min and 20min.