Search code examples
sqlsql-servert-sqlrecursive-queryvistadb

Given a recursive query that starts with a child, how can I eliminate sibling and parent rows?


I have managed to build a recursive query which returns rows for the selected Id and all its children. This works absolutely fine for the ultimate parent, but I need it also to work correctly when the passed Id is that of one of the children, showing just the child and its children if any. Currently it still returns other child rows of the ultimate parent plus the passed child row displays twice...

As with a previous issue I have to do this using the sub-query format, because other TSQL based database engines than SQL Server may be used that do not support CTE or the WITH clause.

Desired Outcome:

Using Id 2, the correct data is returned: 2, 3, 4, 6, 7. Using Id 6, it should return 6, 7 only. Currently the query returns 6,3,4, 6,7.

Data:

ProjectId   ProjectName                             ParentId
1           Test Project                            -1
2           Test Project 2                          0
3           Test Project 2 Sub Project 1            2
4           Test Project 2 Sub Project 2            2
5           Test Project 3                          -1
6           Test Project 2 Sub Sub Project 1        3
7           Test Project 2 Sub Sub Sub Project 1    6

Query:

DECLARE @PROJECTID BIGINT = 2;

SELECT *
FROM            
(
    SELECT *
    FROM ProjectCostingProjects pcp
    WHERE pcp.[ProjectId] = @PROJECTID 
    UNION ALL
    SELECT  pcp2.*
    FROM    ProjectCostingProjects pcp2
    JOIN    ProjectCostingProjects pcp
    ON     pcp2.ParentID = pcp.ProjectId
);

Any advice or suggestions gratefully received.


Solution

  • Here is an answer that uses a while loop instead.

    Solution 1, using while loop and a temp table

    /* Populating the temp table with the data */
    DECLARE @recurse TABLE
    (projectId INT,parent int);
    
    INSERT INTO @recurse (projectId,parent) VALUES (1,-1);
    INSERT INTO @recurse (projectId,parent) VALUES (2,-0);
    INSERT INTO @recurse (projectId,parent) VALUES (3,2);
    INSERT INTO @recurse (projectId,parent) VALUES (4,2);
    INSERT INTO @recurse (projectId,parent) VALUES (5,-1);
    INSERT INTO @recurse (projectId,parent) VALUES (6,3);
    INSERT INTO @recurse (projectId,parent) VALUES (7,6);
    
    DECLARE @recurse2 TABLE
    (projectId INT,parent INT);
    
    --Start by inserting the root element
    INSERT INTO @recurse2 ( projectId, parent)
    SELECT * FROM @recurse WHERE projectId = 2
    
    --insert elements until all children have all children
    WHILE EXISTS (SELECT * FROM @recurse WHERE parent IN (SELECT projectId FROM @recurse2) AND projectId NOT IN (SELECT projectId FROM @recurse2) )
    BEGIN
        INSERT INTO @recurse2
        (
            projectId,
            parent
        )
        SELECT * FROM @recurse WHERE parent IN (SELECT projectId FROM @recurse2) AND projectId NOT IN (SELECT projectId FROM @recurse2)
    END 
    
    SELECT * FROM @recurse2
    

    Solution 2 For performance you can create a intermidate table with the result from the while loop. This intermidiate table can be updated on an intervall, or as part of the creating an element in the main table. This could be either a part of your business logic or as a DB trigger.

    Here is the code I would write if you wanted this as a scheduled job:

    -- Populating the temp table with the data 
    DECLARE @recurse TABLE
    (projectId INT,parent int);
    
    INSERT INTO @recurse (projectId,parent) VALUES (1,-1);
    INSERT INTO @recurse (projectId,parent) VALUES (2,-0);
    INSERT INTO @recurse (projectId,parent) VALUES (3,2);
    INSERT INTO @recurse (projectId,parent) VALUES (4,2);
    INSERT INTO @recurse (projectId,parent) VALUES (5,-1);
    INSERT INTO @recurse (projectId,parent) VALUES (6,3);
    INSERT INTO @recurse (projectId,parent) VALUES (7,6);
    
    DECLARE @recurse2 TABLE
    (projectId INT,parent INT, lvl int);
    
    --Start by inserting all elements root at lvl 0
    INSERT INTO @recurse2 ( projectId, parent, lvl ) SELECT projectId, parent, 0 FROM @recurse 
    SELECT * FROM @recurse2
    
    --insert elements until we have all parents for all elements
    WHILE EXISTS (SELECT * FROM @recurse2 a WHERE lvl IN (SELECT TOP 1 lvl FROM @recurse2 b WHERE a.projectId = b.projectId ORDER BY lvl DESC) AND a.parent > 0 )
    BEGIN
        --Insert the next parent for all elements that does not have a their top level parent allready
        INSERT INTO @recurse2 ( projectId, parent , lvl )
        SELECT  projectId, 
                (SELECT b.parent FROM @recurse b WHERE b.projectId = a.parent), 
                lvl + 1 
        FROM @recurse2 a WHERE lvl IN (SELECT TOP 1 lvl FROM @recurse2 b WHERE a.projectId = b.projectId ORDER BY lvl DESC) AND a.parent > 0 
    END 
    
    --Find all children to an element
    SELECT * FROM @recurse2 WHERE parent = 2
    

    The big advantage of solution #2 is that the performance should be REALY good for reads, probably even better than CTE's. Also it works equally well for reading from bottom to top, as top to bottom.