Search code examples
t-sqlself-reference

T-SQL Sum Since Condition Last Met


I'm trying to do a sort of running total with a variable window size. I want to sum a value from the current row to the last time another row met a condition.

Mock Data: I want a formula for the last column

ID condition_col col_to_sum running_total_since_last_true
01 TRUE 10 10
02 FALSE 8 18
03 FALSE 12 30
04 TRUE 5 5
05 FALSE 8 13
06 FALSE 10 23
07 FALSE -2 21
08 TRUE 1 1
09 FALSE 1 2

What I have tried:

select ID
     , condition_col
     , col_to_sum
     , (
        select sum(col_to_sum) as total
          from tbl t1
         where t1.ID between tbl.ID and (select top 1 t2.ID 
                                           from tbl t2 
                                          where condition_col = TRUE 
                                       order by t2.ID))
        ) as running_total_since_last_true
  from tbl

Because of the size of the dataset and the other things going on in the query (not shown here) this resulted in the query taking several minutes to run for a few thousand rows.

Now I'm looking at using sum() over(rows between current row and (select...)

I feel like I'm on the right track, but I keep getting stuck.

I also tried lag() but I couldn't make the lag reference itself properly.

Any ideas?


Solution

  • You would be better off putting your data into groups first and then using a cumulative SUM. You can put your data into groups by using a conditional cumulative COUNT (inside a CTE):

    WITH Grps AS(
        SELECT ID,
               condition_col,
               col_to_sum,
               COUNT(CASE condition_col WHEN 'TRUE' THEN 1 END) OVER (ORDER BY ID ASC
                                                                      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
        FROM (VALUES(01,'TRUE',10,10),
                    (02,'FALSE',8,18),
                    (03,'FALSE',12,30),
                    (04,'TRUE',5,5),
                    (05,'FALSE',8,13),
                    (06,'FALSE',10,23),
                    (07,'FALSE',-2,21),
                    (08,'TRUE',1,1),
                    (09,'FALSE',1,2))V(ID,condition_col,col_to_sum,running_total_since_last_true))
    SELECT ID,
           condition_col,
           col_to_sum,
           SUM(col_to_sum) OVER (PARTITION BY Grp ORDER BY ID
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total_since_last_true
    FROM Grps;