Search code examples
sqlpostgresqlsumamazon-redshiftwindow-functions

Row number with condition


I want to increase the row number of a partition based on a condition. This question refers to the same problem, but in my case, the column I want to condition on is another window function.

I want to identify the session number of each user (id) depending on how long ago was their last recorded action (ts).

My table looks as follows:

id  ts         
 1  2022-08-01 09:00:00 -- user 1, first session
 1  2022-08-01 09:10:00
 1  2022-08-01 09:12:00
 1  2022-08-03 12:00:00 -- user 1, second session
 1  2022-08-03 12:03:00
 2  2022-08-01 11:04:00 -- user 2, first session
 2  2022-08-01 11:07:00
 2  2022-08-25 10:30:00 -- user 2, second session
 2  2022-08-25 10:35:00
 2  2022-08-25 10:36:00

I want to assign each user a session identifier based on the following conditions:

  • If the user's last action was 30 or more minutes ago (or doesn't exist), then increase (or initialize) the row number.
  • If the user's last action was less than 30 minutes ago, don't increase the row number.

I want to get the following result:

id  ts                   session_id
 1  2022-08-01 09:00:00           1
 1  2022-08-01 09:10:00           1
 1  2022-08-01 09:12:00           1
 1  2022-08-03 12:00:00           2
 1  2022-08-03 12:03:00           2
 2  2022-08-01 11:04:00           1
 2  2022-08-01 11:07:00           1
 2  2022-08-25 10:30:00           2
 2  2022-08-25 10:35:00           2
 2  2022-08-25 10:36:00           2

If I had a separate column with the seconds since their last session, I could simply add 1 to each user's partitioned sum. However, this column is a window function itself. Hence, the following query doesn't work:

select
  id
  ,ts
  ,extract(
    epoch from (
      ts - lag(ts, 1) over(partition by id order by ts)
    )
  ) as seconds_since -- Number of seconds since last action (works well)
  ,sum(
    case
      when coalesce(
        extract(
          epoch from (
            ts - lag(ts, 1) over (partition by id order by ts)
          )
        ), 1800
      ) >= 1800 then 1
      else 0 end
  ) over (partition by id order by ts) as session_id -- Window inside window (crashes)
from
  t
order by
  id
  ,ts

ERROR: Aggregate window functions with an ORDER BY clause require a frame clause


Solution

  • Use LAG() window function to get the previous ts of each row and create flag column indicating if the difference between the 2 timestamps is greater than 30 minutes.
    Then use SUM() window function over that flag:

    SELECT
      id
      ,ts
      ,SUM(flag) OVER (
        PARTITION BY id
        ORDER BY ts
        rows unbounded preceding -- necessary in aws-redshift
      ) as session_id
    FROM (
      SELECT
        *
        ,COALESCE((LAG(ts) OVER (PARTITION BY id ORDER BY ts) < ts - INTERVAL '30 minute')::int, 1) flag
      FROM
        tablename
    ) t
    ;
    

    See the demo.