I have a data set where i'm trying to create a "session id" based on a timestamp where certain event happens (i.e. load) in my case
My data:
userid event timestamp
xyz load '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000'
xyz view '2016-12-01 08:21:16:000'
xyz exit '2016-12-01 08:21:17:000'
xyz load '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000'
abc load '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000'
What i'm trying to achieve is to create a new column called session_start_timestamp where the row is marked against the last "load" for each user.
I know how to do this by creating a subset table (by taking the minimum timestamp and self joining), but is there a lag/lead/max/partition function that can do this instead?
The final output should look like:
userid event timestamp session_start_timestamp
xyz load '2016-12-01 08:21:13:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:16:000' '2016-12-01 08:21:13:000'
xyz exit '2016-12-01 08:21:17:000' '2016-12-01 08:21:13:000'
xyz load '2016-12-02 08:01:13:000' '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000' '2016-12-02 08:01:13:000'
abc load '2016-12-01 08:11:13:000' '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000' '2016-12-01 08:11:13:000'
This is a gap/island problem:
SQL DEMO (postgresql)
SUM()
calculate the groupsMIN()
time from each group--
WITH gap as (
SELECT *, CASE WHEN "event" = 'load' THEN 1 ELSE 0 END as gap
FROM Table1
), island as (
SELECT *, SUM(gap) OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM gap
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island
OUTPUT
You can merge first two queries:
WITH island as (
SELECT *, SUM (CASE WHEN "event" = 'load' THEN 1 ELSE 0 END )
OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM Table1
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island