I have an interesting thing that I have to do in Redshift. Say, I have a table like this:
index,total,delta
0,3,null
1,5,2
2,10,5
3,11,1
4,null,4
5,null,6
6,null,2
where delta
is the difference between each total
and the previous row's total
. Here, my delta
s come from a different source than my total
s, so it is possible to receive delta
s without receiving updated total
s.
How can I impute the total
s according to the delta
s? Like so:
index,total,delta
0,3,null
1,5,2
2,10,5
3,11,1
4,15,4
5,21,6
6,23,2
I'm messing around somewhere in the vicinity of NVL(total, LAST_VALUE(total IGNORE NULLS) OVER (ORDER BY index ROWS UNBOUNDED PRECEDING) + SUM(delta) OVER (ORDER BY index ROWS UNBOUNDED PRECEDING)
, but that doesn't quite do it - I only want to SUM
the delta
s for which there are no corresponding total
.
You could use:
-- creating subgroups
WITH cte AS (
SELECT *, SUM(total IS NOT NULL::int) OVER(ORDER BY index) s
FROM tab
)
SELECT index,
CASE WHEN total IS NULL
THEN SUM(COALESCE(total,0) + delta) OVER(PARTITION BY s ORDER BY index)
-FIRST_VALUE(delta) OVER(PARTITION BY s ORDER BY index)
-- running total starting from first not null total + delta
-- decreased by first delta
ELSE total
END AS total
,delta
FROM cte
ORDER BY index;
Output:
┌────────┬────────┬───────┐
│ index │ total │ delta │
├────────┼────────┼───────┤
│ 0 │ 3 │ │
│ 1 │ 5 │ 2 │
│ 2 │ 10 │ 5 │
│ 3 │ 11 │ 1 │
│ 4 │ 15 │ 4 │
│ 5 │ 21 │ 6 │
│ 6 │ 23 │ 2 │
└────────┴────────┴───────┘
EDIT:
Actually there is no need for FIRST_VALUE
:
WITH cte AS (
SELECT *, SUM(total IS NOT NULL::int) OVER(ORDER BY index) s
FROM tab
)
SELECT index,
CASE WHEN total IS NULL
THEN SUM(COALESCE(total,0)
+ CASE WHEN total IS NOT NULL THEN 0 ELSE delta END)
OVER(PARTITION BY s ORDER BY index)
ELSE total
END AS total
,delta
FROM cte
ORDER BY index;