Search code examples
postgresqlstatisticsaggregate-functionspercentile

Get percent rank for a given value in a given table column


I have a Postgres table with about 500k rows. One of the columns called score has values ranging from 0-1. The data is not normally distributed.

Say I have an observation of 0.25. I'd like to find out where this would fall in the distribution of the score column. This is sometimes referred to as the percent rank.

E.G. a value of 0.25 is in the 40th percentile. This would mean that a value of 0.25 is larger than 40% of the observations in the table.

I know I can calculate a frequency distribution with something like below but this feel like overkill when all I want is a percentile value.

select k, percentile_disc(k) within group (order by mytable.score)
from mytable, generate_series(0.01, 1, 0.01) as k
group by k

Solution

  • Sounds like you want the hypothetical-set aggregate function percent_rank():

    SELECT percent_rank(0.25) WITHIN GROUP (ORDER BY score)
    FROM   mytable;
    

    The manual:

    Computes the relative rank of the hypothetical row, that is (rank - 1) / (total rows - 1). The value thus ranges from 0 to 1 inclusive.