Search code examples
sqlsql-servercommon-table-expressionwindow-functions

Set a Group number when cumulative column less than or equal to target


I have a column which contains how many sub_links a website has.

My table schema is id, sub_link, link

id  sub_link link group
1   5   link1.com   1
2   2   link2.com   2
3   4   link3.com   3
4   1   link4.com   3
4   1   link4.com   3
5   2   link5.com   4
6   4   link6.com   5
7   3   link7.com   6
7   3   link7.com   6

I want to add a column with group number when the total of sub_link is equal or less than 5 along with avoiding duplicates.

I don't know where to start since I can only write select statements and make some joins. Maybe there is a way using Window Function or CTE which I prefer to further my experience.

Here is a fiddle link

https://www.db-fiddle.com/f/6rmtcazWaWvLULZ5QgpmSb/1

Thank you for your help.


Solution

  • Based on your comment below, I'm not sure how to do this outside of keeping track of the rolling Sub_link sum in a separate temp table. An example of this below. IF you only needed to keep track of the Sub_links for current record and previous, may be way to accomplish with single query utilizing LAG function.

    USE [master]
    GO
    CREATE DATABASE [Test]
    GO
    
    USE [Test]
    GO
    
    CREATE TABLE [Test] (
        ID INT
      , Sub_Links TINYINT
      , Link VARCHAR(100)
    );
    
    INSERT INTO [Test]
    VALUES (1, 5, 'link1.com')
        , (2, 2, 'link2.com') 
        , (3, 4, 'link3.com')
        , (4, 1, 'link4.com')
        , (4, 1, 'link4.com')
        , (5, 2, 'link5.com')
        , (6, 4, 'link6.com')
        , (7, 3, 'link7.com')
        , (7, 3, 'link7.com');
    
    SET NOCOUNT ON
    GO
    CREATE TABLE #Staging
    (
          ID INT
        , Link VARCHAR(100)
        , Sub_Links INT
        , GroupNum INT
        , SublinkRollingSum TINYINT
    )
    GO
    
    CREATE CLUSTERED INDEX [StagingOrder] ON #Staging(ID, Link) --Since need to guarantee order, doing this upfront should be more efficient
    GO
    
    INSERT INTO #Staging(ID, Link, Sub_Links)
    SELECT DISTINCT --Don't include duplicate records
          ID
        , Link
        , Sub_Links
    FROM Test
    ORDER BY ID, Link
    GO
    --CREATE INDEX [GroupIndex] ON #Staging(GroupNum, SublinkRollingSum)--Intended to improve performance of below while loop
    --GO
    
    WITH FirstRecord AS
    (
        SELECT TOP(1) *
        FROM #Staging
        ORDER BY ID, Link
    )
    UPDATE FirstRecord
    SET GroupNum = 1, SublinkRollingSum = Sub_Links --This is the starting point
    
    DECLARE 
          @CurrentID INT
        , @CurrentLink VARCHAR(100)
        , @CurrentGroup INT
        , @SublinkRollingSum TINYINT
    
    SELECT TOP(1) 
          @CurrentID = ID
        , @CurrentLink = Link
        , @CurrentGroup = GroupNum
        , @SublinkRollingSum = SublinkRollingSum
    FROM #Staging
    ORDER BY ID, Link
    
    WHILE (@@ROWCOUNT > 0)
    BEGIN
        UPDATE #Staging
        SET   SublinkRollingSum = @SublinkRollingSum
            , GroupNum = @CurrentGroup
        WHERE ID = @CurrentID
            AND Link = @CurrentLink
    
        SELECT TOP(1)
              @CurrentID = ID
            , @CurrentLink = Link
            , @CurrentGroup = 
                CASE 
                    WHEN (@SublinkRollingSum + Sub_Links <= 5) 
                        THEN @CurrentGroup
                    ELSE @CurrentGroup + 1
                END
            , @SublinkRollingSum = 
                CASE 
                    WHEN (@SublinkRollingSum + Sub_Links <= 5) 
                        THEN @SublinkRollingSum + Sub_Links
                    ELSE Sub_Links
                END
        FROM #Staging
        WHERE ID > @CurrentID 
            OR (ID = @CurrentID AND Link <> @CurrentLink)
        ORDER BY ID, Link
    END
    
    SELECT 
          t.ID
        , t.Sub_Links
        , t.Link
        , s.GroupNum
    FROM #Staging S
        JOIN Test t ON s.ID = t.ID
            AND s.Link = t.Link
    ORDER BY t.ID, t.Link
    
    DROP TABLE #Staging
    
    --DROP DATABASE [Test]
    

    Screenshot of result set from SSMS