I'm trying to do a sort of running total with a variable window size. I want to sum a value from the current row to the last time another row met a condition.
Mock Data: I want a formula for the last column
ID | condition_col | col_to_sum | running_total_since_last_true |
---|---|---|---|
01 | TRUE | 10 | 10 |
02 | FALSE | 8 | 18 |
03 | FALSE | 12 | 30 |
04 | TRUE | 5 | 5 |
05 | FALSE | 8 | 13 |
06 | FALSE | 10 | 23 |
07 | FALSE | -2 | 21 |
08 | TRUE | 1 | 1 |
09 | FALSE | 1 | 2 |
What I have tried:
select ID
, condition_col
, col_to_sum
, (
select sum(col_to_sum) as total
from tbl t1
where t1.ID between tbl.ID and (select top 1 t2.ID
from tbl t2
where condition_col = TRUE
order by t2.ID))
) as running_total_since_last_true
from tbl
Because of the size of the dataset and the other things going on in the query (not shown here) this resulted in the query taking several minutes to run for a few thousand rows.
Now I'm looking at using sum() over(rows between current row and (select...)
I feel like I'm on the right track, but I keep getting stuck.
I also tried lag() but I couldn't make the lag reference itself properly.
Any ideas?
You would be better off putting your data into groups first and then using a cumulative SUM
. You can put your data into groups by using a conditional cumulative COUNT
(inside a CTE):
WITH Grps AS(
SELECT ID,
condition_col,
col_to_sum,
COUNT(CASE condition_col WHEN 'TRUE' THEN 1 END) OVER (ORDER BY ID ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM (VALUES(01,'TRUE',10,10),
(02,'FALSE',8,18),
(03,'FALSE',12,30),
(04,'TRUE',5,5),
(05,'FALSE',8,13),
(06,'FALSE',10,23),
(07,'FALSE',-2,21),
(08,'TRUE',1,1),
(09,'FALSE',1,2))V(ID,condition_col,col_to_sum,running_total_since_last_true))
SELECT ID,
condition_col,
col_to_sum,
SUM(col_to_sum) OVER (PARTITION BY Grp ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total_since_last_true
FROM Grps;