Search code examples
sqlamazon-athenapresto

Access "real" previous value with SQL lag having duplicate entries for same key


How would it be possible to a get previous timestamp (or any other key for that matter) when there are multiple entries for the same timestamp in SQL (presto).

This is an ideal example outcome

user timestamp category previous timestamp
Mary 1000 a
Mary 1050 b 1000
Mary 1050 a 1000
Mary 1050 c 1000

however if using lag(timestamp,1) over (partition by user order by timestamp) as previous_timestamp you get the following

user timestamp category previous timestamp
Mary 1000 a
Mary 1050 b 1000
Mary 1050 a 1050
Mary 1050 c 1050

which makes sense given that lag() uses the previous row but it's not the expected outcome. Is there any way to make this work?


Solution

  • Syntax for SQL Server, but the logic should work.

    declare @some_table table (
      [user] varchar(5),
      [timestamp] int,
      [category] varchar(5)
    )
    
    insert @some_table
    values 
      ('Mary', 1000, 'a')
    , ('Mary', 1050, 'b')
    , ('Mary', 1050, 'a')
    , ('Mary', 1050, 'c')
    
    select *
    from @some_table
    
    ;
    with a1 as (
      select distinct [user]
      , [timestamp]
      from @some_table
    ),
    a2 as (
      select distinct [user]
      , [timestamp]
      , lag([timestamp],1) over (partition by [user] order by [timestamp]) as previous_timestamp
      from a1
    )
    
    select a.[user]
    , a.[timestamp]
    , b.category
    , a.previous_timestamp
    
    from (
        select [user]
        , [timestamp]
        , lag([timestamp],1) over (partition by [user] order by [timestamp]) as previous_timestamp
        from a1
      ) a 
      join @some_table b on a.[user] = b.[user] 
                     and a.[timestamp] = b.[timestamp];