Search code examples
pythonpandasdataframefunctionhierarchy

Python: define function to get the weighted median


I have the following dataframe:

df1 = pd.DataFrame(
{
"A_close": [10, 12, 15],
"B_close": [20, 19, 29],
"C_close": [23, 21, 4],
"D_close": [45, 47, 44],
},
index = ['01-01-2020', '01-02-2020', '01-03-2020']
)
df2 = pd.DataFrame(
{
"A_weight": [0.1, 0.3, 0.1],
"B_weight": [0.2, 0.1, 0.2],
"C_weight": [0.3, 0.4, 0.1],
"D_weight": [0.5, 0.2, 0.6],
},
index = ['01-01-2020', '01-02-2020', '01-03-2020']
)

df1 = df1.join(df2)
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.sort_index(axis=1)
df1:
            A               B               C               D
            close   weight  close   weight  close   weight  close   weight
01-01-2020  10      0.1     20      0.2     23      0.3     45      0.5
01-02-2020  12      0.3     19      0.1     21      0.4     47      0.2
01-03-2020  15      0.1     29      0.2     4       0.1     44      0.6

I have defined the following function:

def wmedian(dtfrm):
    df = dtfrm.unstack().sort_values('close')
    return df.loc[df['weight'].cumsum() > 0.5, 'close'].iloc[0]

However, when I call the function wmedian(df1), I get an error: KeyError: 'close'.

The outcome I am trying to get is

df:
             close
01-01-2020   23
01-02-2020   21
01-03-2020   44

What is missing in my function? I don't understand why I'm getting that error.


Solution

  • Try to stack only one level:

    wmedian = lambda x: x.loc[x['weight'].cumsum().gt(0.5), 'close'].head(1)
    out = df1.stack(level=0).groupby(level=0).apply(wmedian) \
             .reset_index(level=[1, 2], drop=True)
    

    Output:

    >>> out
    01-01-2020    23
    01-02-2020    21
    01-03-2020    44
    Name: close, dtype: int64
    
    >>> df1.stack(level=0)
                  close  weight
    01-01-2020 A     10     0.1
               B     20     0.2
               C     23     0.3
               D     45     0.5
    01-02-2020 A     12     0.3
               B     19     0.1
               C     21     0.4
               D     47     0.2
    01-03-2020 A     15     0.1
               B     29     0.2
               C      4     0.1
               D     44     0.6