SciPy.stats has a function called percentileofscore. To keep my package dependencies down, I want to source the most similar function possible from numpy, instead.
import numpy as np
a = np.array([3, 2, 1])
np.percentile(a, a)
>>>
array([1.06, 1.04, 1.02])
percentileofscore(a,a)
>>>
array([100. , 66.66666667, 33.33333333])
I'm not sure what is is that Numpy is doing... But it's not returning intuitive percentiles to me. How can I achieve the same functionality using built-in numpy methods.
Of note, by default, percentileofscore will average percentiles for ties. I do want to preserve this functionality. Ex [100, 100]
should not return [0, 100]
but [50, 50]
instead.
You can actually take look at the implementation in Scipy, it is rather simple (https://github.com/scipy/scipy/blob/v1.12.0/scipy/stats/_stats_py.py#L2407). Reproducing this in Numpy gives:
import numpy as np
from scipy.stats import percentileofscore
random_state = np.random.default_rng(123)
a = random_state.integers(0, 100, 100)
scores = np.array([50, 80, 90])
print(percentileofscore(a, scores, kind="mean"))
def percentile_of_score_np(x, scores):
left = np.count_nonzero(x < scores[:, None], axis=-1)
right = np.count_nonzero(x <= scores[:, None], axis=-1)
return (left + right) * (50.0 / len(x))
print(percentile_of_score_np(a, scores))
Which prints:
[55. 83. 92.5]
[55. 83. 92.5]
I hope this helps!