Search code examples
pythonpandasweighted-average

Calculate weighted average with Pandas for decreasing cost


I am installing a ranking system and basically I have a field called site_fees that accounts for 10% of the total for consideration. A site fee of 0 would get all 10 points. What I want to do is calculate how many points the non-zero fields would get, but I am struggling to do so.

My initial approach was to split the dataframe into 2 dataframes (dfb where site_fees are 0 and dfa where they are > 0) and calculate the average for dfa, assign the rating for dfb as 10, then union the two.

The code is as follows:

dfSitesa = dfSites[dfSites['site_fees'].notnull()]
dfSitesb = dfSites[dfSites['site_fees'].isnull()]
dfSitesa['rating'] = FeeWeight * \
dfSitesa['site_fees'].min()/dfSitesa['site_fees']
dfSitesb['rating'] = FeeWeight
dfSites = pd.concat([dfSitesa,dfSitesb])

This produces an output, however the results of dfa are not correct because the minimum of dfa is 5000 instead of 0, so the rating of a site with $5000 in fees is 10 (the maximum, not correct). What am I doing wrong?

The minimum non-zero site_fee is 5000 and the maximum is 15000. Based on this, I would expect a general ranking system like:

15000 | 0
10000 | 3.3
5000  | 6.6
0     | 10

Solution

  • Here is a way to do it :

    dfSites = pd.DataFrame({'site_fees':[0,1,2,3,5]})
    FeeWeight = 10
    
    dfSitesa = dfSites[dfSites['site_fees'].notnull()]
    dfSitesb = dfSites[dfSites['site_fees'].isnull()]
    dfSitesb['rating'] = FeeWeight
    factor = (dfSitesa['site_fees'].max() - dfSitesa['site_fees'].min())
    dfSitesa['rating'] = FeeWeight * ( 1 - ( (dfSitesa['site_fees'] -  dfSitesa['site_fees'].min()) / factor) )
    dfSites = pd.concat([dfSitesa,dfSitesb])
    
    In [1] : print(dfSites)
    Out[1] : 
    site_fees   rating
    0   0   10.0
    1   1   8.0
    2   2   6.0
    3   3   4.0
    4   5   0.0