Search code examples
sql-servert-sqlrecursive-query

Recursive CTE and performance issue when anchor member is a large table


I have made some edits so that the post is clearer.

I have a large table with data like below.

enter image description here

The goal is to be able to retrieve the original ID and location associated with the original ID given any ID as input. For example, if I were to look for ID 2,4,9, the result should be

enter image description here

Two questions:

1)How do I add to this script a column that represents the number of separation degree an ID from the original ID? for example, ID 2 would be 0 (original/parent), ID 4 would be 2 and ID 9 would be 1 (it takes 3 IDs to get to original location) --This question is answered by @ValNik Here is a simplified script I used:

DECLARE @IDs TABLE (
  ID INTEGER
  ,PreviousID INTEGER
  ,Location INTEGER
)

INSERT INTO @IDs
SELECT           2,null,1235
UNION ALL SELECT 3,2,1236
UNION ALL SELECT 4,3,1239
UNION ALL SELECT 8,null,1237
UNION ALL SELECT 9,8,1234
UNION ALL SELECT 10,9,1235

Select * from @IDs


DECLARE @ORDERID Table (OrderID nvarchar (100))
Insert into @ORDERID values
('4')
,('9')
,('2')

;WITH q AS (
    SELECT  ID, PreviousID,Location
    FROM    @IDs
    where ID in (select OrderID from @ORDERID) 
       -- or PreviousID in (select OrderID from @ORDERID)
    UNION ALL 
    SELECT  q.ID, u.PreviousID,q.Location
    FROM    q
            INNER JOIN @IDs u ON u.ID = q.PreviousID
            --and q.ID in (select OrderID from @ORDERID)
)


,CTE_Original as
(
SELECT  q.ID
        ,q.Location
        ,case when Min(PreviousID) is null then ID
                else min(PreviousID) end as OriginalID

FROM    q
GROUP BY q.ID,q.Location
)

Select CTE_Original.*,Original.Location as OriginalLocation from CTE_Original 
left join @IDs Original on Original.ID = CTE_Original.OriginalID
                        
where CTE_Original.ID in (select OrderID from @ORDERID)
order by ID
  1. In real life, @IDs is a temporary table with 3M+ rows that originated from joining a couple tables. The equivalence of @IDs in the example above is #CTE_ORDERID. It takes about 20s to insert values to generate CTE_ORDERID and insert them to #CTE_ORDERID. However, the recursive CTE q takes >10 mins. What can I do in order to have better performance? I don't have access to execution plan.

Here is the script

DECLARE @ORDERID Table (OrderID nvarchar (100))
Insert into @ORDERID values
('119309645')
,('115821862')
,('112942594')

;


Drop table if exists #CTE_OrderID

;
;WITH CTE_OrderID AS (
    SELECT ORDER_MED.ORDER_MED_ID as CURRENT_ORDERID
        ,ORDER_MED.CHNG_ORDER_MED_ID as PREV_ORDERID
        ,CLARITY_DEP.DEPARTMENT_NAME    as WRITTEN_LOCATION
        ,ORDER_MED.ORDERING_DATE        as WRITTEN_DATE
    FROM Clarity_PRD_Report.dbo.ORDER_MED (nolock)  
        LEFT JOIN Clarity_PRD_Report.dbo.CLARITY_DEP (nolock) on ORDER_MED.PAT_LOC_ID = CLARITY_DEP.DEPARTMENT_ID
    GROUP BY ORDER_MED_ID --,PAT.PAT_MRN_ID
        ,ORDER_MED.CHNG_ORDER_MED_ID
        ,CLARITY_DEP.DEPARTMENT_NAME    
        ,ORDERING_DATE  
        )
Select * 
Into #CTE_OrderID
From CTE_OrderID
;
With q as (
    Select CURRENT_ORDERID
           ,PREV_ORDERID
           ,PREV_ORDERID as Last_ORDERID
           ,WRITTEN_LOCATION
           ,WRITTEN_LOCATION as ORIGINAL_WRITTEN_LOCATION
           ,WRITTEN_DATE
           ,WRITTEN_DATE as ORIGINAL_WRITTEN_DATE
           ,0 NumOfTimesRewrite
           
           
    From #CTE_OrderID
        where CURRENT_ORDERID in (Select OrderID from @ORDERID) 

    UNION ALL 
    SELECT  q.CURRENT_ORDERID
            ,u.PREV_ORDERID
            ,q.PREV_ORDERID LAST_ORDERID
            ,q.WRITTEN_LOCATION
            ,u.WRITTEN_LOCATION
            ,q.WRITTEN_DATE
            ,u.WRITTEN_DATE
            ,NumOfTimesRewrite+1

    FROM    q
            inner JOIN #CTE_OrderID u ON u.CURRENT_ORDERID = q.PREV_ORDERID
            --and q.ORDER_MED_ID in (select orderID from @ORDERID)
)

Select CURRENT_ORDERID
    ,Coalesce(LAST_ORDERID,CURRENT_ORDERID) as ORIGINAL_ORDERID
    ,WRITTEN_LOCATION
    ,ORIGINAL_WRITTEN_LOCATION
    ,WRITTEN_DATE
    ,ORIGINAL_WRITTEN_DATE
    ,NumOfTimesRewrite

FROM q
WHERE PREV_ORDERID is null
order by CURRENT_ORDERID

I appreciate any help on this!


Solution

  • See simple example, without check for circular references.
    If you have circular references, you mast throw error, or stop on that node. Or you can hope that you don't have cycles in table:). Or do additional check the path through the nodes.

    WITH q AS (
        SELECT 0 lvl,  ID, PreviousID,PreviousID LastId
            ,Location,Location as OriginalLocation
        FROM    @IDs
        where ID in (select OrderID from @ORDERID) 
        UNION ALL 
        SELECT lvl+1, q.ID,u.PreviousId,q.PreviousId LastId
           ,q.Location,u.Location
        FROM    q
                INNER JOIN @IDs u ON u.ID = q.PreviousID
    )
    select lvl, ID, coalesce(LastId,Id) OriginalId,Location,OriginalLocation 
    from q
    where PreviousId is null
    order by id;
    

    Output is

    lvl ID OriginalId Location OriginalLocation
    0 2 2 1235 1235
    2 4 2 1239 1235
    1 9 8 1234 1237

    Before filter

    lvl ID PreviousID LastId Location OriginalLocation
    0 2 null null 1235 1235
    0 4 3 3 1239 1239
    1 4 2 3 1239 1236
    2 4 null 2 1239 1235
    0 9 8 8 1234 1234
    1 9 null 8 1234 1237

    fiddle

    Update1. With your comment about source tables
    see example

    With q as (
      Select ORDER_MED_ID CURRENT_ORDERID
       ,CHNG_ORDER_MED_ID PREV_ORDERID
       ,CHNG_ORDER_MED_ID as Last_ORDERID          
       ,PAT_LOC_ID as WRITTEN_LOCATION_ID
       ,PAT_LOC_ID as ORIGINAL_WRITTEN_LOCATION_ID
       ,0 NumOfTimesRewrite                     
      From ORDER_MED        
      where ORDER_MED_ID in (4,9,2) -- (Select OrderID from @ORDERID)      
      UNION ALL      
      SELECT  q.CURRENT_ORDERID
       ,u.CHNG_ORDER_MED_ID PREV_ORDERID            
       ,q.PREV_ORDERID LAST_ORDERID             
       ,q.WRITTEN_LOCATION_ID
       ,u.PAT_LOC_ID ORIGINAL_WRITTEN_LOCATION_ID
       ,NumOfTimesRewrite+1      
      FROM q inner JOIN ORDER_MED u 
         ON u.ORDER_MED_ID = q.PREV_ORDERID          
    )
    Select CURRENT_ORDERID  
      ,Coalesce(LAST_ORDERID,CURRENT_ORDERID) as ORIGINAL_ORDERID   
      ,WRITTEN_LOCATION_ID  
      ,ORIGINAL_WRITTEN_LOCATION_ID
      ,NumOfTimesRewrite
      ,d1.department_name written_location
      ,d2.department_name original_written_location
    from q
    left join clarity_dep d1 on d1.department_id=q.written_location_id
    left join clarity_dep d2 on d2.department_id=q.original_written_location_id
    where prev_orderid is null
    order by q.current_orderid
    

    Temporary tables not necessary.

    Demo fiddle