Search code examples
postgresqlquantilepercentile-cont

How to use percentile_conts with multiple quantiles in Postgres


I currently have a query that works like so:

select AVG(t2 - t1) as delay,
       percentile_cont(0.25) within group (order by (t2 - t1)) as q25,
       percentile_cont(0.5) within group (order by (t2 - t1)) as median,
       percentile_cont(0.75) within group (order by (t2 - t1)) as q75,
       p.bool1,
       p.cat1
from people p
group by p.bool1, p.cat1
order by p.cat1,p.bool1

However, I read on the postgres functions aggregation page: https://www.postgresql.org/docs/9.4/functions-aggregate.html

That I should be able to specify multiple quantiles:

percentile_cont(fractions) WITHIN GROUP (ORDER BY sort_expression)  double precision[]  double precision or interval    array of sort expression's type     multiple continuous percentile: returns an array of results matching the shape of the fractions parameter, with each non-null element replaced by the value corresponding to that percentile

I'd like to use this so I don't recalculate the t2 - t1 for every quantile. What's the right syntax to get multiple quantiles? Would I need a subquery?


Solution

  • I'd like to use this so I don't recalculate the t2 - t1 for every quantile

    Lateral join could help in such scenario:

    select AVG(t2 - t1) as delay,
           percentile_cont(0.25) within group (order by s.col) as q25,
           percentile_cont(0.5) within group (order by s.col) as median,
           percentile_cont(0.75) within group (order by s.col) as q75,
           p.bool1,
           p.cat1
    from people p
    ,LATERAL(SELECT t2 - t1 AS col) s
    group by p.bool1, p.cat1
    order by p.cat1,p.bool1;
    

    Related: PostgreSQL: using a calculated column in the same query


    Array are defined as: ARRAY[0.25, 0.5, 0.75] or '{0.25, 0.5, 0.75}'::double precision[]

    select AVG(t2 - t1) as delay,
       -- 1
       percentile_cont(ARRAY[0.25, 0.5, 0.75]) within group (order by (t2 - t1)) as q25,
       -- 2
       percentile_cont('{0.25, 0.5, 0.75}'::double precision[]) 
       within group (order by (t2 - t1)) as q
           p.bool1,
           p.cat1
    from people p
    group by p.bool1, p.cat1
    order by p.cat1,p.bool1;
    

    db<>fiddle demo


    Is there an easy way to inline specify the names of each of the resulting percentile fields as they had been with q25, q50, q75, etc

    WITH cte AS (
        select AVG(t2 - t1) as delay,
           percentile_cont(ARRAY[0.25, 0.5, 0.75]) within group (order by (t2 - t1)) as q,
               p.bool1,
               p.cat1
        from people p
        group by p.bool1, p.cat1
    )
    select cte.*, q[1] AS q25, q[2] AS q50, q[3] AS q75
    from cte
    order by cat1,bool1;
    

    db<>fiddle demo2