Search code examples
pythonpandasdataframeweighted-average

How do you give weights to dataframe columns iteratively for weighted mean average?


I have a dataframe with multiple columns having numerical float values. What I want to do is give fractional weights to each column and calculate its average to store and append it to the same df.

Let's say we have the columns: s1, s2, s3 I want to give the weights: w1, w2, w3 to them respectively

I was able to do this manually while experimenting with all values in hand. But when I go to a list format, it's giving me an error. I was trying to do it through iteration and I've attached my code below, but it was giving me an error. I have also attached my manual code which worked, but it needs it first hand.

Code which didn't work:

score_df["weighted_avg"] += weight * score_df[feature]

Manual Code which worked but not with lists:

df["weighted_scores"] = 0.5*df["s1"] + 0.25*df["s2"] + 0.25*df["s3"]

Solution

  • We can use numpy broadcasting for this, since weights has the same shape as your column axis:

    # given the following example df
    df = pd.DataFrame(np.random.rand(10,3), columns=["s1", "s2", "s3"])
    
    print(df)
        s1   s2   s3
    0 0.49 1.00 0.50
    1 0.65 0.87 0.75
    2 0.45 0.85 0.87
    3 0.91 0.53 0.30
    4 0.96 0.44 0.50
    5 0.67 0.87 0.24
    6 0.87 0.41 0.29
    7 0.06 0.15 0.73
    8 0.76 0.92 0.69
    9 0.92 0.28 0.29
    
    weights = [0.5, 0.25, 0.25]
    df["weighted_scores"] = df.mul(weights).sum(axis=1)
    
    print(df)
        s1   s2   s3  weighted_scores
    0 0.49 1.00 0.50             0.62
    1 0.65 0.87 0.75             0.73
    2 0.45 0.85 0.87             0.66
    3 0.91 0.53 0.30             0.66
    4 0.96 0.44 0.50             0.71
    5 0.67 0.87 0.24             0.61
    6 0.87 0.41 0.29             0.61
    7 0.06 0.15 0.73             0.25
    8 0.76 0.92 0.69             0.78
    9 0.92 0.28 0.29             0.60