Search code examples
pythonsqlpostgresqlscipypercentile

How to Get The Out-Of-Set Percentile In PostgreSQL?


Given a tables

product1_id | score
A  |   2 
B  |   3

and

product2_id | score
W  |   1
X  |   2
Y  |   3
Z  |   4

how can I find the out-of-set percentiles of product1 scores among product2 scores using PostgreSQL to get an expected output of:

product1_id | score | out_of_set_percentile
A  |   2  |  50
B  |   3  |  75.

In python one way to solve this would be to merge the tables and apply scipy.percentileofscore:

from scipy import stats
stats.percentileofscore([1, 2, 3, 4], 3)  # 75.0,

but I'd like a way to do this natively in PostgreSQL


Solution

  • Here is a brute force method:

    select t1.product_id, t1.score,
           avg( (t2.score <= t1.score)::int ) as ratio
    from t1 cross join
         t2
    group by t1.product_id, t1.score;