Search code examples
pythonpandasrank

pandas - ranking with tolerance?


Is there a way to rank values in a dataframe but considering a tolerance?

Say I have the following values

ex = pd.Series([16.52,19.95,16.15,22.77,20.53,19.96])

and if I ran rank:

ex.rank(method='average') 
0    2.0 
1    3.0 
2    1.0 
3    6.0 
4    5.0
5    4.0 
dtype: float64

But what I'd like as a result would be (with a tolereance of 0.01):

0    2.0 
1    3.5 
2    1.0 
3    6.0 
4    5.0
5    3.5 

Any way to define this tolerance?

Thanks


Solution

  • This function may works:

    def rank_with_tolerance(sr, tolerance=0.01+1e-10, method='average'):
        
        vals = pd.Series(sr.unique()).sort_values()
        vals.index = vals
        vals = vals.mask(vals - vals.shift(1) <= tolerance, vals.shift(1))
        
        return sr.map(vals).fillna(sr).rank(method=method)
    

    It works for your given input:

    ex = pd.Series([16.52,19.95,16.15,22.77,20.53,19.96])
    rank_with_tolerance(ex, tolerance=0.01+1e-10, method='average')
    
    # result:
    0    2.0
    1    3.5
    2    1.0
    3    6.0
    4    5.0
    5    3.5
    dtype: float64
    

    And with more complex sets it seems to work too:

    ex = pd.Series([16.52,19.95,19.96, 19.95, 19.97, 19.97, 19.98])
    rank_with_tolerance(ex, tolerance=0.01+1e-10, method='average')
    
    # result:
    0    1.0
    1    3.0
    2    3.0
    3    3.0
    4    5.5
    5    5.5
    6    7.0
    dtype: float64