Search code examples
sqlsnowflake-cloud-data-platformdistinct

Selecting Distinct Consecutive Values of Timeseries


I have a table in snowflake/dbt, where I want to select distinct sequential entries in the rows. For example: If I have

user_id session_id action timestamp
2 3 scroll 21-08-01 12:00:01
2 3 scroll 21-08-01 12:00:02
2 3 scroll 21-08-01 12:00:03
2 3 click 21-08-01 12:00:04
2 3 click 21-08-01 12:00:06
2 3 scroll 21-08-01 12:00:10
2 3 saved 21-08-01 12:00:10

I want to have this at the end

user_id session_id action timestamp
2 3 scroll 21-08-01 12:00:03
2 3 click 21-08-01 12:00:06
2 3 scroll 21-08-01 12:00:10
2 3 saved 21-08-01 12:00:10

I tried to use row_number() and next qualify but that will numerate all the action sequentially even if they are not.


Solution

  • I tried something a little bit different than ggordon, building an inline view with the contents of the "next" record (using the LEAD function).

    select user_id, session_id, action, ts
    from (
      select abc.*, 
             lead(user_id) ignore nulls 
               over (order by ts, user_id, session_id, action) next_user_id, 
             lead(session_id) ignore nulls 
               over (order by ts, user_id, session_id, action) next_session_id, 
             lead(action) ignore nulls 
               over (order by ts, user_id, session_id, action) next_action, 
             lead(ts) ignore nulls 
               over (order by ts, user_id, session_id, action) next_ts
      from   abc 
      order by ts, user_id, session_id, action)
    where user_id = NVL(next_user_id, user_id)
    and   session_id = NVL(next_session_id, session_id)
    and   action <> NVL(next_action, 'x')
    order by ts, user_id, session_id, action;
    

    This worked well and I was able to get the same four records you wanted.