Search code examples
sqlsql-server-2012hierarchyhierarchical-datatransitive-closure-table

How to turn this cursor operation into a set based operation on a closure hierarchy?


  1. Trees.DirectReports is a closure (hierarchy) table.
  2. There is a table called Users with: RowID, EmployeeId, and MangerId. @RowCount is the number of records in this table and #EmpMgr is a cursor for this table.

Below is the relevant sql code that I would like to convert from a cursor based operation to a set based operation.

WHILE @RowCount <= @NumberRecords --loop through each record in Users table
BEGIN       
    SET @EmpId = (SELECT EmployeeId FROM #EmpMgr WHERE RowID = @RowCount)
    SET @MgrId = (SELECT ManagerId FROM #EmpMgr WHERE RowID = @RowCount)

    INSERT INTO [Trees].[DirectReports](EmployeeId, ManagerId, Depth)
    SELECT c.EmployeeId, p.ManagerId, p.Depth + c.Depth + 1
    FROM Trees.DirectReports p join Trees.DirectReports c
    WHERE p.EmployeeId = @MgrId AND c.ManagerId = @EmpId

    SET @RowCount = @RowCount + 1
END

So I'd really like to figure out how to do this as a set query because I know it would be much faster that way but my brain isn't quite making the proper connections today to figure it out.

*Note that to answer this question you will need to already understand how closure tables work. Otherwise the above probably won't make sense.


Solution

  • Found what I was looking for with the help of a couple other posts. The main answer is this:

    WITH cte AS
    (
        SELECT LegacyId ancestor, LegacyId descendant, 0 depth FROM Users
        UNION ALL
    
        SELECT cte.ancestor, u.LegacyId descendant, cte.depth + 1 depth
        FROM   dbo.Users u JOIN cte ON u.ManagerId = cte.descendant
    )
    select * from cte
    

    However, what threw me off at first was that there was some bad data causing circular dependencies. I was able to use the following query to identify where those instances were:

    with cte (id,pid,list,is_cycle) 
    as
    (
        select      legacyid id, managerid pid,',' + cast (legacyid as varchar(max))  + ',',0
        from        users
    
        union all
    
        select      u.legacyid id, 
                    u.managerid pid, 
                    cte.list + cast(u.legacyid as varchar(10)) +  ',' ,case when cte.list like '%,' + cast (u.legacyid as varchar(10)) + ',%' then 1 else 0 end
        from        cte join users u on u.managerid = cte.id
        where       cte.is_cycle = 0
    )
    select      *
    from        cte
    where       is_cycle = 1
    

    Once I corrected the cyclical data everything worked great. Check out the following SO posts for more information as these are what I used to come up with my solution: Is there a way to detect a cycle in Hierarchical Queries in SQL Server? and How can I create a closure table using data from an adjacency list?