Search code examples
sqlpostgresqlwindow-functionsgreenplumanalytical

Is it possible to find out the number of items in a row by using window functions in PostgreSQL?


how to find the number of sellers who have payments, the time between which in a row is less than 1 minute and which are executed at least 3 times in a row? (answer is 2 sellers) and how to calculate the number of such payments? (answer is 10 payments) it seems like such a problem can be solved using window functions, but I have never encountered this kind of problem

enter image description here

CREATE TABLE T (seller_id int, payment_id varchar(3), payment_time timestamp, second_diff int);
    
INSERT INTO T (seller_id, payment_id, payment_time, second_diff)
VALUES
    (1, 'pl',  '2015-01-08 09:23:04', 151),
    (1, 'p2',  '2015-01-08 09:25:35', 50),
    (1, 'p3',  '2015-01-08 09:26:25', 48),
    (1, 'p4',  '2015-01-08 09:27:23', 36),
    (1, 'p5',  '2015-01-08 09:27:59', 41),
    (1, 'p6',  '2015-01-08 09:28:40', 70),
    (1, 'p7',  '2015-01-08 09:29:50', 50),
    (1, 'p8',  '2015-01-08 09:30:40', 45),
    (1, 'p9',  '2015-01-08 09:31:25', 35),
    (1, 'p10', '2015-01-08 09:32:00', null),
    (2, 'pll', '2015-01-08 09:25:35', 25),
    (2, 'p12', '2015-01-08 09:26:00', 55),
    (2, 'p13', '2015-01-08 09:26:55', 30),
    (2, 'p14', '2015-01-08 09:27:25', 95),
    (2, 'p15', '2015-01-08 09:29:00', null),
    (3, 'p16', '2015-01-08 10:41:00', 65),
    (3, 'p17', '2015-01-08 10:42:05', 75),
    (3, 'p18', '2015-01-08 10:43:20', 90),
    (3, 'p19', '2015-01-08 10:43:20', 39),
    (3, 'p20', '2015-01-08 10:43:59', 50),
    (3, 'p21', '2015-01-08 10:44:49', null);

Solution

  • with A as (
        select seller_id, payment_time, second_diff,
            case when
                lag(case when second_diff < 60 then 1 else 0 end)
                    over (partition by seller_id order by payment_time)
                  = case when second_diff < 60 then 1 else 0 end
                then 0 else 1 end as transition
        from T
    ), B as (
        select *,
            sum(transition)
                over (partition by seller_id order by payment_time) as grp
        from A
    ), C as (
        select seller_id, count(*) as p
        from B
        where second_diff < 60
        group by seller_id, grp
        having count(*) >= 3
    ) 
    select count(distinct seller_id) as sellers, sum(p) as payments
    from C;
    

    This approach looks for transitions in the values, counting them up. The output values of the inner case expressions don't really matter as long as they match.

    https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=606b796d793248336a95637f02ce117b

    Here are some variations on the theme:

    Option #1b:

    with A as (
        select seller_id, payment_time, second_diff,
            case when
                lag(case when second_diff < 60 then 1 else 0 end)
                    over (partition by seller_id order by payment_time)
                  = case when second_diff < 60 then 1 else 0 end
                then 0 else 1 end as transition
        from T
    ), B as (
        select *,
            sum(transition)
                over (partition by seller_id order by payment_time) as grp
        from A
    )
    select
        dense_rank() over (order by seller_id)
          + dense_rank() over (order by seller_id desc) - 1 as sellers,
        sum(count(*)) over () as payments
    from B
    where second_diff < 60
    group by seller_id, grp
    having count(*) >= 3
    limit 1;
    

    This is just a different way to do the count(distinct) in one step.

    Option #2:

    with A as (
        select seller_id, payment_time, second_diff,
            row_number() over (partition by seller_id order by payment_time) as rn
        from T
    ), B as (
        select *,
            rn - row_number() over (partition by seller_id order by payment_time) as grp
        from A
        where second_diff < 60    
    ), C as (
        select seller_id, count(*) as p
        from B
        group by seller_id, grp
        having count(*) >= 3
    ) 
    select count(distinct seller_id) as sellers, sum(p) as payments
    from C;
    

    This method uses row numbering pre- and post-filtering find breaks in the series.