Search code examples
pythontime-seriesdata-sciencedata-manipulationweighted-average

How to calculate time weighted average using Python?


So i have data with irregular intervals in a day.

Event Time Value
17-5-2021 03:00 84.9
17-5-2021 11:00 84.9
17-5-2021 15:00 84.7
17-5-2021 23:00 84.7
18-5-2021 03:00 84.5
18-5-2021 11:00 84.5
18-5-2021 15:00 84.9
18-5-2021 23:00 84.9

I want to calculate time weighted average using python on the above data as value was only 83.7 for 37.5% (9 hours out of 24) where as if calculate normal average it will be accounted for 50% for 17-5-2021.

Assumption: If we don't have value for particular interval then last available value is taken eg: value at 17-5-2021 04:00 is 84.9 as that was the last available value. Any input would be helpful as I am not able to figure a right way to approach this. Expected output:

Please see the image for Calculation

Final result

Event Time Weighted Average
17-5-2021 84.79166
18-5-2021 84.71666

Solution

  • I think you can use pandas diff, groupby and rolling function to achieve this. You can use the following steps to achieve this:

    1. Convert Event Time into datetime
    2. Calculate the time difference between consecutive times using diff function and get the difference as seconds using total_seconds and divide it by 3600 to convert in hours.
    3. Compute the weighted values by taking product of Value and time difference
    4. Compute the weighted average using rolling function. Keep the window length as 2. Divide this by sum of hours in the period. Here it is 12 hrs
    5. Compute the daily average of weighted values using groupby and transform. Time start is 12AM
    6. Compute Rolling daily average by setting datetimeindex and passing window as 1D.
    import pandas as pd
    
    df = pd.read_csv('test.csv')
    df['Event Time'] = pd.to_datetime(df['Event Time'])
    df['Time Diff'] = df['Event Time'].diff(periods=1).dt.total_seconds()/3600
    df['Time Diff'] = df['Time Diff'].fillna(4) 
    # You dont need to do the above step in large data. Dropping would be better for large data
    df['Weighted Value'] = df['Value']*df['Time Diff']
    # calculate the weighted average based on number of periods
    df['Weighted Average'] = df['Weighted Value'].rolling(2).sum()/12
    # calculate average for each day.day starts at 12AM
    df['Daily Weighted Fixed Window'] = df.groupby(df['Event Time'].dt.date)['Weighted Value'].transform('sum')/24
    # calculate the weighted average for last one day (stats from current time minus 24 hours)
    df.set_index('Event Time', inplace=True)
    df['Daily Weighted Rolling'] = df['Weighted Value'].rolling('1D').sum()/24 
    
    Event Time Value Time Diff Weighted Value Weighted Average Daily Weighted Fixed Window Daily Weighted Rolling
    2021-05-17 03:00:00 84.9 4 339.6 nan 84.8 14.15
    2021-05-17 11:00:00 84.9 8 679.2 84.9 84.8 42.45
    2021-05-17 15:00:00 84.7 4 338.8 84.8333 84.8 56.5667
    2021-05-17 23:00:00 84.7 8 677.6 84.7 84.8 84.8
    2021-05-18 03:00:00 84.5 4 338 84.6333 84.7 84.7333
    2021-05-18 11:00:00 84.5 8 676 84.5 84.7 84.6
    2021-05-18 15:00:00 84.9 4 339.6 84.6333 84.7 84.6333
    2021-05-18 23:00:00 84.9 8 679.2 84.9 84.7 84.7

    I have update the answer. If you need anything more let me know.