Search code examples
pythonpandasdataframeaggregateweighted-average

Grouping columns of dataframe by other dataframe and calculate weighted average of aggregated columns


I have a DataFrame df1 with four time series. I want to aggregate these time-series based on the DataFrame groups which classifies the individual time-series into two groups. Additionally, I have a DataFrame weights, which defines the weighting factor (change over time) of the time-series within the groups.

What I try to get is a DataFrame df2 which has the time-series aggregated based on the groups and weighted with the weighting factors.

This would be a simple example:

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 2.3, 0.3, 0.5], 
    '02K W':[3.5, 0.1, 'nan', 'nan'], 
    '03K W':[4.2, 5.2, 2.5, 3.0], 
    '04K W':[1.5, 2.6, 8.2, 4.2]}) 

groups = pd.DataFrame({
    'ID':['01K W', '02K W', '03K W', '04K W'],
    'Group':['Group1', 'Group1', 'Group2', 'Group1']}) 

weights = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[0.5, 0.5, 0.25, 0.5], 
    '02K W':[0.25, 0.25, 'nan', 'nan'], 
    '03K W':[1, 1, 1, 1], 
    '04K W':[0.25, 0.25, 0.75, 0.5]}) 

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Group1':[1.85, 1.82, 6.23, 2.35], 
    'Group2':[4.2, 5.2, 2.5, 3.0]})

The first element in df2 (=1.85) is calculated as: 1.2x0.5 + 3.5x0.25 + 1.5x0.25

Values of group 1: 1.2, 3.5 & 1.5 ('2021-01-01')

Weighting factors for group 1: 0.5, 0.25 & 0.25 ('2021-01-01')


Solution

    1. Multiply df1 with weights
    2. stack and map the column names to the groups
    3. groupby "Date" and group ("level_1") and sum
    4. unstack and format to the desired output
    output = df1.set_index("Date").mul(weights.set_index("Date")).stack().reset_index(1)
    output = (output.groupby([output.index, 
                              output["level_1"].map(dict(zip(groups["ID"],groups["Group"])))])
              .sum()
              .unstack()
              .droplevel(0,1)
              .rename_axis(None, axis=1)
              )
    
    >>> output
                Group1  Group2
    Date                      
    2021-01-01   1.850     4.2
    2021-01-02   1.825     5.2
    2021-01-03   6.225     2.5
    2021-01-04   2.350     3.0