Search code examples
sqlsql-servercommon-table-expressionhierarchicalhierarchical-query

SQL Server CTE to find path from one ID to another ID


I have a table in a SQL Server 2008 R2 database that defines transitions between different states.

Example relevant table columns:

TransitionID | SourceStateID | DestinationStateID | WorkflowID
--------------------------------------------------------------
1            | 17            | 18                 | 3
2            | 17            | 21                 | 3
3            | 17            | 24                 | 3
4            | 17            | 172                | 3
5            | 18            | 17                 | 3
6            | 18            | 24                 | 3
7            | 18            | 31                 | 3
8            | 21            | 17                 | 3
9            | 21            | 20                 | 3
10           | 21            | 141                | 3
11           | 21            | 184                | 3

... etc.

My goal is to be able to define a starting StateID, an ending StateID, and the WorkflowID, and hopefully find a logical path from the starting StateID to the ending StateID within that workflow.

e.g. For the table above, if I supplied a starting StateID of 17, and ending StateID of 184, and a WorkflowID of 3, I could wind up with a 'path' of 17>21>184 (or more ideally, TransitionID 2 > TransitionID 11)

The Good:

  • The Starting and Ending StateID's defined will always have a possible path within the defined WorkflowID

The Bad:

  • There are certainly circular references on virtually every source/destination StateID (i.e. there's likely a transition from SourceStateID 1 to DestinationStateID 2, and also from SourceStateID 2 to DestinationStateID 1

  • There are certainly multiple possible paths from virtually any defined starting StateID and ending StateID

I realize it's some manner of CTE I'm after, but admittedly I find CTE's hard to grasp in general, and doubly so when circular references are a guaranteed problem.

The perfect solution would be one that selects the shortest path from the starting StateID to the ending StateID, but to be honest at this point if I can get something working that will reliably give me any valid path between the two states I'll be so happy.

Any SQL gurus out there have some direction you could point me in? I honestly don't even know where to begin to prevent circular issues like getting a path along the lines of 17>18>17>18>17>18...

DISGUSTING UPDATE I came up with this query which hurts me on an emotional level (with some help from this post: https://stackoverflow.com/a/11042012/3253311), but appears to be working.

DECLARE @sourceStatusId INT = 17
DECLARE @destinationStatusId INT = 24
DECLARE @workflowId INT = 3

DECLARE @sourceStatusIdString NVARCHAR(MAX) = CAST(@sourceStatusId AS             NVARCHAR(MAX))
DECLARE @destinationStatusIdString NVARCHAR(MAX) = CAST(@destinationStatusId     AS NVARCHAR(MAX))
DECLARE @workflowIdString NVARCHAR(MAX) = CAST(@workflowId AS NVARCHAR(MAX))

;WITH CTE ([Source], [Destination], [Sentinel]) AS 
(
SELECT
    CAST(t.[Source] AS NVARCHAR(MAX)) AS [Source],
    CAST(t.[Destination] AS NVARCHAR(MAX)) AS [Destination],
    [Sentinel] = CAST([Source] AS NVARCHAR(MAX))
FROM
    Transitions t
WHERE
    CAST([Source] AS NVARCHAR(MAX)) = @sourceStatusIdString AND 
    CAST([WorkflowID] AS NVARCHAR(MAX)) = @workflowIdString

UNION ALL

SELECT
    CAST(CTE.[Destination] AS NVARCHAR(MAX)),
    CAST(t.[Destination] AS NVARCHAR(MAX)),
    CAST([Sentinel] AS NVARCHAR(MAX)) + CAST('|' AS NVARCHAR(MAX)) + CAST(CTE.[Destination] AS NVARCHAR(MAX))
FROM
    CTE
    JOIN Transitions t
        ON CAST(t.[Source] AS NVARCHAR(MAX)) = CAST(CTE.[Destination] AS     NVARCHAR(MAX))
WHERE
    CHARINDEX(CTE.[Destination], Sentinel)=0
)

SELECT TOP 1
[Sentinel]
FROM
CTE
WHERE
LEFT([Sentinel], LEN(@sourceStatusIdString)) = @sourceStatusIdString AND 
RIGHT([Sentinel], LEN(@destinationStatusIdString)) =     @destinationStatusIdString 
ORDER BY
LEN([Sentinel])

Solution

  • Similar to Quassnoi's answer, but prevents circular references that start after the first element:

    DECLARE @src int = 18, @dst int = 184;
    WITH cte  AS 
    (Select TransitionId, SourceStateId, DestinationStateID, SourceStateID as StartingState, 0 as Depth
      , cast(TransitionId as varchar(max)) + ','  as IDPath  
      , cast(SourceStateID as varchar(max)) + ',' as States
     FROM Transitions
     WHERE SourceStateId = @src
     UNION ALL
     Select t.TransitionId, t.SourceStateId, t.DestinationStateID, StartingState, cte.Depth + 1
      , cte.IDPath + cast(t.TransitionId as varchar(max)) + ',' 
      , cte.States + cast(t.SourceStateID as varchar(max)) + ','
     FROM cte JOIN Transitions t on cte.DestinationStateID = t.SourceStateId 
     and CHARINDEX(',' + cast(t.SourceStateID as varchar(max)) + ',', States) = 0 --prevent loop starting after first element
     and t.DestinationStateID <> StartingState --prevent loop starting with first element
     where cte.Depth < 10 -- prevent going too deep
     )
     select TransitionId, SourceStateId, DestinationStateID, Depth, left(IDPath, len(IDPath) - 1) IDPath, left(States, len(States) - 1) States
     from cte
     where DestinationStateID = @dst
     order by Depth