Search code examples
postgresqlpercentile

Postgres percentile_cont with cardinality


I have been using the new percentile_cont in Postgres to calculate percentiles for a table ever since it was launched. However, we are now changing the table to include cardinality for each row, and I'm unsure as to how to implement percentile_cont to take this into account.

Let's say the table looked like this before:

+--------+--------------+
| name   | age          |
+--------+--------------+
|  Joe   | 10           |
+--------+--------------+
|  Bob   | 11           |
+--------+--------------+
|  Lisa  | 12           |
+--------+--------------+

Calculating the 85th percentile for age in the set would simply be done using: percentile_cont(0.85) WITHIN group (ORDER BY age asc) 85

Now, we have a cardinality for each name (the number of people with that specific name). It looks something like this:

+--------------+--------+
| name   | age | count  |
+--------+-----+--------+
|  Joe   | 10  |   2    |
+--------+-----+--------+
|  Bob   | 11  |   1    |
+--------+-----+--------+
|  Lisa  | 12  |   1    |
+--------+-----+--------+

Is there any way to use percentile_cont or any other built in function in Postgres to calculate the percentile taking the count/cardinality into account?


Solution

  • The most obvious solution is to duplicate rows according to count.

    The first thing I thought of was a recursive query:

    with recursive data (name, age, count) as (
        select * 
        from a_table
    union all
        select name, age, count- 1
        from data
        where count > 1
        )
    select name, age 
    from data
    order by name, age;
    

    In Postgres, you can use generate_series() to get the same in an easier and faster way.

    select name, age
    from a_table
    cross join generate_series(1, count)
    order by name, age;
    

    Test it in db<>fiddle.