Search code examples
sqlpostgresqlgreenplum

Filter on window function according to hour


I want to use two different (yet similar) window functions to calculate two values SUM and COUNT on is_active over user_id+item, only up to the time of the row - minus 1 hour. My intuition was use ROWS UNBOUNDED PRECEDING but that way I can't filter over the time

COUNT(1) OVER(PARTITION BY user_id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 
SUM(is_active) OVER(PARTITION BY user-id, item ORDER BY req_time ROWS UNBOUNDED PRECEDING) 

However, that doesn't take under consideration the '1 hour ago' interval factor

Consider the following data:

user_id |     req_time       | item  | is_active |  
--------+--------------------+-------------------+---
1   | 2011-01-01 12:00:00|   1   |     0     |
1   | 2011-01-01 12:30:00|   1   |     1     |
1   | 2011-01-01 15:00:00|   1   |     1     |
1   | 2011-01-01 16:00:00|   1   |     0     |
1   | 2011-01-01 16:00:00|   2   |     0     |
1   | 2011-01-01 16:20:00|   2   |     1     |
2   | 2011-02-02 11:00:00|   1   |     1     |
2   | 2011-02-02 13:00:00|   1   |     0     |
1   | 2011-02-02 16:20:00|   1   |     0     |
1   | 2011-02-02 16:30:00|   2   |     0     |

I expect the following results: "value 1" is SUM(is_active) and "value 2" is COUNT(1):

user_id |     req_time       | item  | value 1 | value 2 |  
--------+--------------------+-----------------+---------+
1   | 2011-01-01 12:00:00|   1   |    0    |    0    |
1   | 2011-01-01 12:30:00|   1   |    0    |    0    |
1   | 2011-01-01 15:00:00|   1   |    1    |    2    |
1   | 2011-01-01 16:00:00|   1   |    2    |    3    |
1   | 2011-01-01 16:00:00|   2   |    0    |    0    |
1   | 2011-01-01 16:20:00|   2   |    0    |    0    |
2   | 2011-02-02 11:00:00|   1   |    0    |    0    |
2   | 2011-02-02 13:00:00|   1   |    1    |    1    |
1   | 2011-02-02 16:20:00|   1   |    2    |    4    |
1   | 2011-02-02 16:30:00|   2   |    1    |    2    |

I'm using Greenplum 4.21 which is based on Postgresql 8.2.15

Thanks in advance! gilibi


Solution

  • 8.3 at SQL Fiddle. Only one subselect.

    select user_id, req_time, item, v[1] as value1, v[2] as value2
    from (
        select t.*,
            (
                select array[
                    coalesce(sum(is_active::integer), 0),
                    count(*)
                    ] as v
                from t s
                where
                    user_id = t.user_id
                    and item = t.item
                    and req_time <= t.req_time - interval '1 hour'
            ) as v
        from t
    ) s
    order by req_time, user_id, item