Search code examples
sqlrecursive-cte

To find infinite recursive loop in CTE


I'm not a SQL expert, but if anybody can help me.

I use a recursive CTE to get the values as below.

Child1 --> Parent 1

Parent1 --> Parent 2

Parent2 --> NULL

If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.

Child1 --> Parent 1

Parent1 --> Child1

or

Child1 --> Parent 1

Parent1 --> Parent2

Parent2 --> Child1


Solution

  • You haven't specified the dialect or your column names, so it is difficult to make the perfect example...

    -- Some random data
    IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
        DROP TABLE #MyTable
    
    CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
    INSERT INTO #MyTable (ID, ParentID, Description) VALUES
    (1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
    (2, 1, 'Child'), -- Try changing the second value (1) to 2 
    (3, 2, 'SubChild')
    -- End random data
    
    ;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
    (
        SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
        UNION ALL
        SELECT R.StartingID, R.Level + 1, 
            R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
            CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
            MT.*
            FROM #MyTable MT
            INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
    )
    
    SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
        FROM RecursiveCTE 
        ORDER BY StartingID, Level
    

    Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.

    In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).

    The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.

    A little more complex, that generates a "better" report:

    -- Some random data
    IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
        DROP TABLE #MyTable
    
    CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
    INSERT INTO #MyTable (ID, ParentID, Description) VALUES
    (1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
    (2, 1, 'Child'), -- Try changing the second value (1) to 2 
    (3, 3, 'SubChild')
    -- End random data
    
    -- The "terminal" childrens (that are elements that don't have childrens
    -- connected to them)
    ;WITH WithoutChildren AS
    (
        SELECT MT1.* FROM #MyTable MT1
            WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
    )
    
    , RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
    (
        SELECT ID, -- StartingID 
            1, -- Level
            '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
            '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
            0, -- Loop
            ParentID
            FROM WithoutChildren
        UNION ALL
        SELECT R.StartingID, -- StartingID
            R.Level + 1, -- Level
            R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
            R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
            CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
            MT.ParentID
            FROM #MyTable MT
            INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
    )
    
    SELECT * FROM RecursiveCTE 
        WHERE ParentID IS NULL OR Loop = 1
    

    This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.