Search code examples
pythonnumpyscipyscipy.stats

How to compute percentiles with numpy?


SciPy.stats has a function called percentileofscore. To keep my package dependencies down, I want to source the most similar function possible from numpy, instead.

import numpy as np
a = np.array([3, 2, 1])
np.percentile(a, a)
>>>
array([1.06, 1.04, 1.02]) 

percentileofscore(a,a)
>>>
array([100.        ,  66.66666667,  33.33333333])

I'm not sure what is is that Numpy is doing... But it's not returning intuitive percentiles to me. How can I achieve the same functionality using built-in numpy methods.

Of note, by default, percentileofscore will average percentiles for ties. I do want to preserve this functionality. Ex [100, 100] should not return [0, 100] but [50, 50] instead.


Solution

  • You can actually take look at the implementation in Scipy, it is rather simple (https://github.com/scipy/scipy/blob/v1.12.0/scipy/stats/_stats_py.py#L2407). Reproducing this in Numpy gives:

    import numpy as np
    from scipy.stats import percentileofscore
    
    random_state = np.random.default_rng(123)
    
    a = random_state.integers(0, 100, 100)
    scores = np.array([50, 80, 90])
    
    print(percentileofscore(a, scores, kind="mean"))
    
    def percentile_of_score_np(x, scores):
        left = np.count_nonzero(x < scores[:, None], axis=-1)
        right = np.count_nonzero(x <= scores[:, None], axis=-1)
        return (left + right) * (50.0 / len(x))
    
    print(percentile_of_score_np(a, scores))
    

    Which prints:

    [55.  83.  92.5]
    [55.  83.  92.5]
    

    I hope this helps!