Search code examples
sqlhivelagwindow-functions

How to use Hive sql's lag function with two columns?


I feel like I have a fairly simple SQL problem to solve, just don't know how to search for it correctly.

Say I have a table where values are updated based on time:

|timestamp|value|session|
|---------|-----|-------|
| ts1     | v1  |  s1   |
| ts2     | v2  |  s1   |
| ts3     | v3  |  s1   |
| ...     | ..  |  s2   |

I want to get current value and previous value with associated timestamps.

So the result should be:

|timestamp_current|value_current|timestamp_prev|value_prev|
|-----------------|-------------|--------------|----------|
|      ts2        |      v2     |    ts1       |    v1    |
|      ts3        |      v3     |    ts2       |    v2    |
|      ...        |      ..     |    ...       |    ..    |

I think the following query is correct if I was only looking to get the previous value, not the previous timestamp:

select timestamp, value, lag(value,1) over (partition by (session) order by timestamp) from mytable

However, what is the correct way of adding TWO values from the previous row, do I add two lag clauses or is there a better way?


Solution

  • You can derive your result by using lag() twice; once for prev_timestamp and once for prev_val as below.

    select * from
    (
    select timestamp, 
           value, 
           lag(timestamp) over(partition by session order by timestamp) as prev_timestamp, 
           lag(value) over(partition by session order by timestamp) as prev_value
    from table1
    ) t
    where prev_timestamp is not null
    

    where clause is used to exclude rows having prev_timestamp as NULL

    Result:

    +-----------+-------+----------------+------------+
    | timestamp | value | prev_timestamp | prev_value |
    +-----------+-------+----------------+------------+
    | ts2       | v2    | ts1            | v1         |
    | ts3       | v3    | ts2            | v2         |
    +-----------+-------+----------------+------------+
    

    DEMO