With two tables, Values
and dates
, I would like to get the average value between the date ranges.
Values
looks like:
Date | Value |
---|---|
2023-01-01 10:00 | 1 |
2023-01-01 11:00 | 2 |
2023-01-02 10:00 | 4 |
2023-01-04 10:00 | 4 |
2023-01-07 10:00 | 4 |
and dates
looks like
Group | StartDay | EndDay |
---|---|---|
1 | 2023-01-01 | 2023-01-05 |
2 | 2023-01-03 | 2023-01-10 |
As you can see, the date ranges can overlap.
I am trying to calculate the averages over these ranges, so in this example the output should be something along the lines of
Group | StartDay | EndDay | Mean |
---|---|---|---|
1 | 2023-01-01 | 2023-01-05 | 2.75 |
2 | 2023-01-03 | 2023-01-10 | 4 |
Currently my code looks like (all one line):
Values.groupby(np.where(Values['Date'].between(Dates['StartDay'],Dates['EndDay']),'pre','post'))['value'].mean()
however this results in
ValueError: Can only compare identically-labeled Series objects
This was based on other similar questions, however does not appear to apply here due to it being over two tables / using ranges.
Try:
# convert the values to datetime (if not already):
df1['Date'] = pd.to_datetime(df1['Date'])
df2['StartDay'] = pd.to_datetime(df2['StartDay'])
df2['EndDay'] = pd.to_datetime(df2['EndDay'])
df1 = df1.sort_values(by='Date').set_index('Date')
df2['Mean'] = df2.apply(lambda x: df1[x['StartDay']:x['EndDay']].mean(), axis=1)
print(df2)
Prints:
Group StartDay EndDay Mean
0 1 2023-01-01 2023-01-05 2.75
1 2 2023-01-03 2023-01-10 4.00