Search code examples
pythonpandasgroup-by

How to average based on data range in a difference table in Pandas


With two tables, Values and dates, I would like to get the average value between the date ranges. Values looks like:

Date Value
2023-01-01 10:00 1
2023-01-01 11:00 2
2023-01-02 10:00 4
2023-01-04 10:00 4
2023-01-07 10:00 4

and dates looks like

Group StartDay EndDay
1 2023-01-01 2023-01-05
2 2023-01-03 2023-01-10

As you can see, the date ranges can overlap.

I am trying to calculate the averages over these ranges, so in this example the output should be something along the lines of

Group StartDay EndDay Mean
1 2023-01-01 2023-01-05 2.75
2 2023-01-03 2023-01-10 4

Currently my code looks like (all one line):

Values.groupby(np.where(Values['Date'].between(Dates['StartDay'],Dates['EndDay']),'pre','post'))['value'].mean()

however this results in ValueError: Can only compare identically-labeled Series objects

This was based on other similar questions, however does not appear to apply here due to it being over two tables / using ranges.


Solution

  • Try:

    # convert the values to datetime (if not already):
    df1['Date'] = pd.to_datetime(df1['Date'])
    df2['StartDay'] = pd.to_datetime(df2['StartDay'])
    df2['EndDay'] = pd.to_datetime(df2['EndDay'])
    
    df1 = df1.sort_values(by='Date').set_index('Date')
    
    df2['Mean'] = df2.apply(lambda x: df1[x['StartDay']:x['EndDay']].mean(), axis=1)
    print(df2)
    

    Prints:

       Group   StartDay     EndDay  Mean
    0      1 2023-01-01 2023-01-05  2.75
    1      2 2023-01-03 2023-01-10  4.00