I have been using the new percentile_cont in Postgres to calculate percentiles for a table ever since it was launched. However, we are now changing the table to include cardinality for each row, and I'm unsure as to how to implement percentile_cont to take this into account.
Let's say the table looked like this before:
+--------+--------------+
| name | age |
+--------+--------------+
| Joe | 10 |
+--------+--------------+
| Bob | 11 |
+--------+--------------+
| Lisa | 12 |
+--------+--------------+
Calculating the 85th percentile for age in the set would simply be done using: percentile_cont(0.85) WITHIN group (ORDER BY age asc) 85
Now, we have a cardinality for each name (the number of people with that specific name). It looks something like this:
+--------------+--------+
| name | age | count |
+--------+-----+--------+
| Joe | 10 | 2 |
+--------+-----+--------+
| Bob | 11 | 1 |
+--------+-----+--------+
| Lisa | 12 | 1 |
+--------+-----+--------+
Is there any way to use percentile_cont or any other built in function in Postgres to calculate the percentile taking the count/cardinality into account?
The most obvious solution is to duplicate rows according to count
.
The first thing I thought of was a recursive query:
with recursive data (name, age, count) as (
select *
from a_table
union all
select name, age, count- 1
from data
where count > 1
)
select name, age
from data
order by name, age;
In Postgres, you can use generate_series()
to get the same in an easier and faster way.
select name, age
from a_table
cross join generate_series(1, count)
order by name, age;
Test it in db<>fiddle.