I am trying to run a SQL query to find a 50th percentile in a table within a certain group, but then i am also grouping the result over the same field. Here is my query, for example over the tpch's nation table:
SELECT
regionkey,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY regionkey)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey
Is this a valid query? I am trying to QA my code by running all kinds of different percentile queries. Postgres returns this result for the above query,:
regionkey | percentile_cont
-------------+-----------------
0 | 0
1 | 1
2 | 2
3 | 3
4 | 4
(5 rows)
But my question is in the real world, would someone would try to run such type of queries? I am new to aggregate functions, specially the percentile functions.
You would use percentile_cont()
to get a percentage of some ordered value. For instance, if you had a population
column for the region, then you would calculate the median population as:
SELECT regionkey,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY population)
FROM "tpch/nation"
GROUP BY regionkey
ORDER BY regionkey;
Using regionkey
in the ORDER BY
is non-sensical. Within each group, regionkey
has only one value. Hence, the PERCENTILE_CONT()
can only return that value.
However, it can be quite useful with almost any other column.