Search code examples
sqlpostgresqlpercentilepercentile-cont

Is this a valid PERCENTILE_CONT SQL query?


I am trying to run a SQL query to find a 50th percentile in a table within a certain group, but then i am also grouping the result over the same field. Here is my query, for example over the tpch's nation table:

SELECT 
   regionkey,
   PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY regionkey)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey

Is this a valid query? I am trying to QA my code by running all kinds of different percentile queries. Postgres returns this result for the above query,:

 regionkey    | percentile_cont 
 -------------+-----------------
            0 |               0
            1 |               1
            2 |               2
            3 |               3
            4 |               4
 (5 rows)

But my question is in the real world, would someone would try to run such type of queries? I am new to aggregate functions, specially the percentile functions.


Solution

  • You would use percentile_cont() to get a percentage of some ordered value. For instance, if you had a population column for the region, then you would calculate the median population as:

    SELECT regionkey,
           PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY population)
    FROM "tpch/nation"
    GROUP BY regionkey
    ORDER BY regionkey;
    

    Using regionkey in the ORDER BY is non-sensical. Within each group, regionkey has only one value. Hence, the PERCENTILE_CONT() can only return that value.

    However, it can be quite useful with almost any other column.