Search code examples
pandaspercentilequartile

Which method does pandas use for percentile?


I was trying to understand lower/upper percentiles calculation in pandas and got a bit confused. Here is the sample code and output for it.

test = pd.Series([7, 15, 36, 39, 40, 41])
test.describe()

output:

enter image description here

I am interested in only 25%, 75% percentiles. I wonder which method does pandas use to calculate them?

Referring to https://en.wikipedia.org/wiki/Quartile the article, results are different as following:

enter image description here

So what statistical/mathematical method does pandas uses to calculate percentile?


Solution

  • As I mentioned in the comments, I finally figured out how it works by trying from pandas.core.algorithms import quantile using quantile function as @Abdou suggested.

    I am not that good to explain it only by typing, therefore I will do it only on the given example for 25% and 75% for this example only. Here is the brief (maybe poor) explanation:

    For the example list [7, 15, 36, 39, 40, 41] quantiles are following way:

    7 -> 0%

    15 -> 20%

    36 -> 40%

    39 -> 60%

    40 -> 80%

    41 -> 100%

    Since we want to find 25% percentile, it will be between 15 and 36, moreover, it is 20% + 5% = 15 + (36-15)/4 = 15 + 5.25 = 20.25.

    (36-15)/4 is used, because the distance between 15 and 36 is 40% - 20% = 20%, so we divide it by 4 to get 5%.

    The same way we can find 75%.

    60% + 15% = 39 + 3*(40-39)/4 = 39.75

    That's it. I am really sorry for poor explanation

    NOTE: Thank you @shin for the correction mentioned in the comment.