There are many ways to get a lagged value of a certain column in SQL, e.g.:
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
,or:
select variable_of_interest
,lag(variable_of_interest ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest
from DATA
I use the second version, but my code runs very slow when "lagging" many variables, such that my code becomes:
select variable_of_interest_1
,variable_of_interest_2
,variable_of_interest_3
,lag(variable_of_interest_1 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_1
,lag(variable_of_interest_2 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_2
,lag(variable_of_interest_3 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_3
from DATA
I wonder, is this because each lag function must by its own partition and order the whole data set, even though the are using the same partition and order?
I am not 100% sure about how DB2 optimizes such queries. If it executes each lag independently, then there is definitely room to improve the optimizer.
One method you could use is lag()
with a join
on the primary key:
select t.*, tprev.*
from (select t.*, lag(id) over ( . . . ) as prev_id
from t
) t left join
t tprev
on t.id = tprev.prev_id ;
From what you describe, this might be the most efficient method to do what you want.
This should be more efficient than row_number()
because the join can make use of an index.