So i have data with irregular intervals in a day.
Event Time | Value |
---|---|
17-5-2021 03:00 | 84.9 |
17-5-2021 11:00 | 84.9 |
17-5-2021 15:00 | 84.7 |
17-5-2021 23:00 | 84.7 |
18-5-2021 03:00 | 84.5 |
18-5-2021 11:00 | 84.5 |
18-5-2021 15:00 | 84.9 |
18-5-2021 23:00 | 84.9 |
I want to calculate time weighted average using python on the above data as value was only 83.7 for 37.5% (9 hours out of 24) where as if calculate normal average it will be accounted for 50% for 17-5-2021.
Assumption: If we don't have value for particular interval then last available value is taken eg: value at 17-5-2021 04:00 is 84.9 as that was the last available value. Any input would be helpful as I am not able to figure a right way to approach this. Expected output:
Please see the image for Calculation
Final result
Event Time | Weighted Average |
---|---|
17-5-2021 | 84.79166 |
18-5-2021 | 84.71666 |
I think you can use pandas diff, groupby and rolling function to achieve this. You can use the following steps to achieve this:
import pandas as pd
df = pd.read_csv('test.csv')
df['Event Time'] = pd.to_datetime(df['Event Time'])
df['Time Diff'] = df['Event Time'].diff(periods=1).dt.total_seconds()/3600
df['Time Diff'] = df['Time Diff'].fillna(4)
# You dont need to do the above step in large data. Dropping would be better for large data
df['Weighted Value'] = df['Value']*df['Time Diff']
# calculate the weighted average based on number of periods
df['Weighted Average'] = df['Weighted Value'].rolling(2).sum()/12
# calculate average for each day.day starts at 12AM
df['Daily Weighted Fixed Window'] = df.groupby(df['Event Time'].dt.date)['Weighted Value'].transform('sum')/24
# calculate the weighted average for last one day (stats from current time minus 24 hours)
df.set_index('Event Time', inplace=True)
df['Daily Weighted Rolling'] = df['Weighted Value'].rolling('1D').sum()/24
Event Time | Value | Time Diff | Weighted Value | Weighted Average | Daily Weighted Fixed Window | Daily Weighted Rolling |
---|---|---|---|---|---|---|
2021-05-17 03:00:00 | 84.9 | 4 | 339.6 | nan | 84.8 | 14.15 |
2021-05-17 11:00:00 | 84.9 | 8 | 679.2 | 84.9 | 84.8 | 42.45 |
2021-05-17 15:00:00 | 84.7 | 4 | 338.8 | 84.8333 | 84.8 | 56.5667 |
2021-05-17 23:00:00 | 84.7 | 8 | 677.6 | 84.7 | 84.8 | 84.8 |
2021-05-18 03:00:00 | 84.5 | 4 | 338 | 84.6333 | 84.7 | 84.7333 |
2021-05-18 11:00:00 | 84.5 | 8 | 676 | 84.5 | 84.7 | 84.6 |
2021-05-18 15:00:00 | 84.9 | 4 | 339.6 | 84.6333 | 84.7 | 84.6333 |
2021-05-18 23:00:00 | 84.9 | 8 | 679.2 | 84.9 | 84.7 | 84.7 |
I have update the answer. If you need anything more let me know.