I have a table in snowflake/dbt, where I want to select distinct sequential entries in the rows. For example: If I have
user_id | session_id | action | timestamp |
---|---|---|---|
2 | 3 | scroll | 21-08-01 12:00:01 |
2 | 3 | scroll | 21-08-01 12:00:02 |
2 | 3 | scroll | 21-08-01 12:00:03 |
2 | 3 | click | 21-08-01 12:00:04 |
2 | 3 | click | 21-08-01 12:00:06 |
2 | 3 | scroll | 21-08-01 12:00:10 |
2 | 3 | saved | 21-08-01 12:00:10 |
I want to have this at the end
user_id | session_id | action | timestamp |
---|---|---|---|
2 | 3 | scroll | 21-08-01 12:00:03 |
2 | 3 | click | 21-08-01 12:00:06 |
2 | 3 | scroll | 21-08-01 12:00:10 |
2 | 3 | saved | 21-08-01 12:00:10 |
I tried to use row_number() and next qualify but that will numerate all the action sequentially even if they are not.
I tried something a little bit different than ggordon, building an inline view with the contents of the "next" record (using the LEAD function).
select user_id, session_id, action, ts
from (
select abc.*,
lead(user_id) ignore nulls
over (order by ts, user_id, session_id, action) next_user_id,
lead(session_id) ignore nulls
over (order by ts, user_id, session_id, action) next_session_id,
lead(action) ignore nulls
over (order by ts, user_id, session_id, action) next_action,
lead(ts) ignore nulls
over (order by ts, user_id, session_id, action) next_ts
from abc
order by ts, user_id, session_id, action)
where user_id = NVL(next_user_id, user_id)
and session_id = NVL(next_session_id, session_id)
and action <> NVL(next_action, 'x')
order by ts, user_id, session_id, action;
This worked well and I was able to get the same four records you wanted.