Search code examples
postgresqlfunctionwindowpercentilequartile

How do I group customers in to equal sets of revenue?


really struggling to come up with an answer for this one. There are various functions which provide Ntiles. Often these provide equal sets of rows per ntile.

I require to group customers by (approximate) equal sets of revenue. This way I can show the customers responsible for the top 10% of revenue and next 10% etc

In other words, all customers are sorted by revenue and grouped in to deciles of revenue.

        create table test_set (name text, revenue numeric);

        insert into test_set values ('a1',234.34);
        insert into test_set values ('a2',263.60);
        insert into test_set values ('a3',316.99);
        insert into test_set values ('a4',351.68);
        insert into test_set values ('a5',353.07);
        insert into test_set values ('a6',406.33;
        insert into test_set values ('a7',432.78);
        insert into test_set values ('a8',444.88);
        insert into test_set values ('a9',452.21);
        insert into test_set values ('a10',543.75);
        insert into test_set values ('a11',582.43);
        insert into test_set values ('a12',670.98);
        insert into test_set values ('a13',720.46);
        insert into test_set values ('a14',801.69);
        insert into test_set values ('a15',803.92);
        insert into test_set values ('a16',821.03);
        insert into test_set values ('a17',875.57);
        insert into test_set values ('a18',930.95);
        insert into test_set values ('a19',973.49);
        insert into test_set values ('a20',1055.48);
        insert into test_set values ('a21',1059.01);
        insert into test_set values ('a22',1131.09);
        insert into test_set values ('a23',1215.51);
        insert into test_set values ('a24',1232.19);
        insert into test_set values ('a25',1310.59);
        insert into test_set values ('a26',1353.31);
        insert into test_set values ('a27',1368.28);
        insert into test_set values ('a28',1412.58);
        insert into test_set values ('a29',1412.38);
        insert into test_set values ('a30',1489.78);
        insert into test_set values ('a31',1496.31);
        insert into test_set values ('a32',1587.17);
        insert into test_set values ('a33',3629.34);
        insert into test_set values ('a34',4642.01);
        insert into test_set values ('a35',6682.07);
        insert into test_set values ('a36',752.70);
        insert into test_set values ('a37',3819.59);
        insert into test_set values ('a38',2857.22);
        insert into test_set values ('a39',873.45);
        insert into test_set values ('a40',882.42);

Hope someone can help... Thanks


Solution

  • Using the cumulative SUM() window function might help:

    demo:db<>fiddle

    select
        name, revenue,
        sum(revenue) OVER () as total,
        sum(revenue) OVER (ORDER BY revenue) as cumulative,
        (sum(revenue) OVER (ORDER BY revenue) / sum(revenue) OVER () * 100) as cumulative_procent,
        trunc((sum(revenue) OVER (ORDER BY revenue) / sum(revenue) OVER () * 100) / 10) as group_by_tenth
    from test_set
    order by revenue
    
    1. column: Calculates the total SUM of revenues
    2. column: Calculates the cumulative SUM ordered by revenues
    3. column: Takes both terms for calculating the increasing procent value
    4. column: For clustering you can trunct this result to tenth parts