Search code examples
sql-serveraggregate-functionscumulative-sumdatabase-cursor

Select running total until specific SUM is reached


I am trying to select the first n rowid values from the following table variable that will get me as close to a sum(itemcount) of 200,000 without crossing that threshhold. If I was looking at this manually, I would just take the top 3 rows. I do not want to use a cursor unless there is no pure-set-based way.

What is a good set-based way to get all of the rowid values "sum while/until" I get to a running total of 200,000?

I looked at "running totals" at http://www.1keydata.com/sql/sql-running-totals.html but that did not seem like it would work out because the real table has around 500k rows.

Here is what I have tried so far:

declare  @agestuff table ( rowid int primary key , itemcount int , itemage datetime )
insert into @agestuff values ( 1 , 175000 , '2013-01-24 17:21:40' )
insert into @agestuff values ( 2 , 300    , '2013-01-24 17:22:11' )
insert into @agestuff values ( 3 , 10000 , '2013-01-24 17:22:11' )
insert into @agestuff values ( 4 , 19000 , '2013-01-24 17:22:19' )
insert into @agestuff values ( 5 , 16000 , '2013-01-24 17:22:22' )
insert into @agestuff values ( 6 , 400   , '2013-01-24 17:23:06' )
insert into @agestuff values ( 7 , 25000 , '2013-01-24 17:23:06' )

select sum(itemcount) from @agestuff  -- 245700 which is too many

select sum(itemcount) from @agestuff  
  where rowid in (1,2,3) -- 185300 which gets me as close as possible

Using SQL Server 2008. I'll switch to 2012 if necessary.


Solution

  • Windowing Functions - SQL Server 2012 only

    DECLARE @point INT = 200000;
    
    ;WITH x(rowid, ic, r, s) AS
    (
      SELECT
        rowid, itemcount, ROW_NUMBER() OVER (ORDER BY itemage, rowid),
        SUM(itemcount) OVER (ORDER BY [itemage], rowid RANGE UNBOUNDED PRECEDING)
      FROM @agestuff
    )
    SELECT x.rowid, x.ic, x.s
    FROM x WHERE x.s <= @point
    ORDER BY x.rowid; 
    

    Results:

    rowid  ic      sum   
    -----  ------  ------
    1      175000  175000
    2      300     175300
    3      10000   185300
    

    SQL fiddle demo

    If you can't use SQL Server 2012 for some reason, then on SQL Server 2008 you can use a couple of alternatives:


    Quirky Update

    Note that this behavior is not documented, nor is it guaranteed to calculate your running totals in the correct order. So please use at your own risk.

    DECLARE @st TABLE
    (
        rowid INT PRIMARY KEY,
        itemcount INT,
        s INT
    );
     
    DECLARE @RunningTotal INT = 0;
     
    INSERT @st(rowid, itemcount, s)
      SELECT rowid, itemcount, 0
        FROM @agestuff
        ORDER BY rowid;
     
    UPDATE @st
      SET @RunningTotal = s = @RunningTotal + itemcount
      FROM @st;
     
    SELECT rowid, itemcount, s
      FROM @st
      WHERE s < @point
      ORDER BY rowid;
    

    Cursor

    DECLARE @st TABLE
    (
      rowid INT PRIMARY KEY, itemcount INT, s INT
    );
     
    DECLARE
      @rowid INT, @itemcount INT, @RunningTotal INT = 0;
     
    DECLARE c CURSOR LOCAL FAST_FORWARD
      FOR SELECT rowid, itemcount
        FROM @agestuff ORDER BY rowid;
     
    OPEN c;
     
    FETCH c INTO @rowid, @itemcount;
     
    WHILE @@FETCH_STATUS = 0
    BEGIN
        SET @RunningTotal = @RunningTotal + @itemcount;
    
        IF @RunningTotal > @point
          BREAK;
     
        INSERT @st(rowid, itemcount, s)
          SELECT @rowid, @itemcount, @RunningTotal;
     
        FETCH c INTO @rowid, @itemcount;
    END
     
    CLOSE c;
    DEALLOCATE c;
     
    SELECT rowid, itemcount, s
      FROM @st
      ORDER BY rowid;
    

    I chose only two alternatives because others are even less desirable (mostly from a performance perspective). You can see them in the following blog post, with some background on how they perform and more information about potential gotchas. Don't paint yourself into a corner because you're stuck on the idea that cursors are bad - sometimes, like in this case, they can be the most efficient supported and reliable option:

    http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals