python time-series data-science data-manipulation weighted-average

How to calculate time weighted average using Python?

So i have data with irregular intervals in a day.

Event Time	Value
17-5-2021 03:00	84.9
17-5-2021 11:00	84.9
17-5-2021 15:00	84.7
17-5-2021 23:00	84.7
18-5-2021 03:00	84.5
18-5-2021 11:00	84.5
18-5-2021 15:00	84.9
18-5-2021 23:00	84.9

I want to calculate time weighted average using python on the above data as value was only 83.7 for 37.5% (9 hours out of 24) where as if calculate normal average it will be accounted for 50% for 17-5-2021.

Assumption: If we don't have value for particular interval then last available value is taken eg: value at 17-5-2021 04:00 is 84.9 as that was the last available value. Any input would be helpful as I am not able to figure a right way to approach this. Expected output:

Please see the image for Calculation

Final result

Event Time	Weighted Average
17-5-2021	84.79166
18-5-2021	84.71666

Solution

I think you can use pandas diff, groupby and rolling function to achieve this. You can use the following steps to achieve this:

Convert Event Time into datetime
Calculate the time difference between consecutive times using diff function and get the difference as seconds using total_seconds and divide it by 3600 to convert in hours.
Compute the weighted values by taking product of Value and time difference
Compute the weighted average using rolling function. Keep the window length as 2. Divide this by sum of hours in the period. Here it is 12 hrs
Compute the daily average of weighted values using groupby and transform. Time start is 12AM
Compute Rolling daily average by setting datetimeindex and passing window as 1D.

import pandas as pd

df = pd.read_csv('test.csv')
df['Event Time'] = pd.to_datetime(df['Event Time'])
df['Time Diff'] = df['Event Time'].diff(periods=1).dt.total_seconds()/3600
df['Time Diff'] = df['Time Diff'].fillna(4) 
# You dont need to do the above step in large data. Dropping would be better for large data
df['Weighted Value'] = df['Value']*df['Time Diff']
# calculate the weighted average based on number of periods
df['Weighted Average'] = df['Weighted Value'].rolling(2).sum()/12
# calculate average for each day.day starts at 12AM
df['Daily Weighted Fixed Window'] = df.groupby(df['Event Time'].dt.date)['Weighted Value'].transform('sum')/24
# calculate the weighted average for last one day (stats from current time minus 24 hours)
df.set_index('Event Time', inplace=True)
df['Daily Weighted Rolling'] = df['Weighted Value'].rolling('1D').sum()/24

Event Time	Value	Time Diff	Weighted Value	Weighted Average	Daily Weighted Fixed Window	Daily Weighted Rolling
2021-05-17 03:00:00	84.9	4	339.6	nan	84.8	14.15
2021-05-17 11:00:00	84.9	8	679.2	84.9	84.8	42.45
2021-05-17 15:00:00	84.7	4	338.8	84.8333	84.8	56.5667
2021-05-17 23:00:00	84.7	8	677.6	84.7	84.8	84.8
2021-05-18 03:00:00	84.5	4	338	84.6333	84.7	84.7333
2021-05-18 11:00:00	84.5	8	676	84.5	84.7	84.6
2021-05-18 15:00:00	84.9	4	339.6	84.6333	84.7	84.6333
2021-05-18 23:00:00	84.9	8	679.2	84.9	84.7	84.7

I have update the answer. If you need anything more let me know.