Search code examples
sqllagwindowing

how do I use windowing functions in sql to persist a record


I have a data set where i'm trying to create a "session id" based on a timestamp where certain event happens (i.e. load) in my case

My data:

userid  event  timestamp
xyz     load   '2016-12-01 08:21:13:000'
xyz     view   '2016-12-01 08:21:14:000'
xyz     view   '2016-12-01 08:21:16:000'
xyz     exit   '2016-12-01 08:21:17:000'
xyz     load   '2016-12-02 08:01:13:000'
xyz     view   '2016-12-02 08:01:16:000'
abc     load   '2016-12-01 08:11:13:000'
abc     view   '2016-12-01 08:11:14:000'

What i'm trying to achieve is to create a new column called session_start_timestamp where the row is marked against the last "load" for each user.

I know how to do this by creating a subset table (by taking the minimum timestamp and self joining), but is there a lag/lead/max/partition function that can do this instead?

The final output should look like:

userid  event  timestamp                  session_start_timestamp
xyz     load   '2016-12-01 08:21:13:000'  '2016-12-01 08:21:13:000'
xyz     view   '2016-12-01 08:21:14:000'  '2016-12-01 08:21:13:000'
xyz     view   '2016-12-01 08:21:16:000'  '2016-12-01 08:21:13:000'
xyz     exit   '2016-12-01 08:21:17:000'  '2016-12-01 08:21:13:000'
xyz     load   '2016-12-02 08:01:13:000'  '2016-12-02 08:01:13:000'
xyz     view   '2016-12-02 08:01:16:000'  '2016-12-02 08:01:13:000'
abc     load   '2016-12-01 08:11:13:000'  '2016-12-01 08:11:13:000'
abc     view   '2016-12-01 08:11:14:000'  '2016-12-01 08:11:13:000'

Solution

  • This is a gap/island problem:

    SQL DEMO (postgresql)

    1. You calculate the gap or break points.
    2. Then using cumulative SUM() calculate the groups
    3. Then select the MIN() time from each group

    --

    WITH gap as (
        SELECT *, CASE WHEN "event" = 'load' THEN 1 ELSE 0 END as gap
        FROM Table1
    ), island as (
        SELECT *, SUM(gap) OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
        FROM gap
    )    
    SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
    FROM island
    

    OUTPUT

    enter image description here

    You can merge first two queries:

    WITH island as (
        SELECT *, SUM (CASE WHEN "event" = 'load' THEN 1 ELSE 0 END ) 
                  OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
        FROM Table1
    )    
    SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
    FROM island