Search code examples
sqlsql-serversql-server-2008graph-theorysql-graph

Efficiently querying a directed/undirected table of graph edges in SQL Server


I've got a SQL server table in which each row represents an edge in a graph network. The FromNodeID and ToNodeID are foreign keys to a node table, and the schema is something like this:

CREATE TABLE #Edges (
  EdgeID int identity (1,1),
  FromNodeID int,
  ToNodeID int
  );

INSERT INTO #Edges (FromNodeID, ToNodeID) VALUES
  (1,2),
  (1,3),
  (1,4),
  (2,3),
  (3,5),
  (4,5),
  (5,6);

Now, if I consider each edge to be directed (i.e., one way), then it's easy to work out all those nodes that I can get to directly from any node. I'd add an index to the FromNodeID column, and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3

Result: 5

But what would be the best way to structure my table/query if I want to treat each edge as unidirectional. i.e. starting from node 3, I'd like to get the results:

Result: 1, 2, 5

The simplest way I can think of would be to add an additional index to the ToNodeID column and then run a query like this:

SELECT ToNodeID FROM #Edges WHERE FromNodeID = 3 
UNION SELECT FromNodeID FROM #Edges WHERE ToNodeID = 3;

But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query? (Note that I don't want to insert the reversed edges again into the table - I need to be able to treat the edges as either directed or undirected at runtime).

Thanks for any advice!


Solution

  • But this obviously involves combining result sets from two queries and doesn't seem very efficient - is there a better way to write this in a single query?

    This is efficient enough.

    You could do this:

    SELECT  CASE 3 WHEN FromNodeId THEN ToNodeId ELSE FromNodeId END
    FROM    Edges
    WHERE   3 IN (FromNodeId, ToNodeId)
    

    but this will be essentially the same (will UNION these indexes under the hood).

    Here's a script to test:

    CREATE TABLE #Edges
            (
            EdgeID INT IDENTITY (1,1) PRIMARY KEY,
            FromNodeID int NOT NULL,
            ToNodeID int NOT NULL
            )
    CREATE INDEX ix_edges_from ON #Edges (FromNodeID, ToNodeId)
    CREATE INDEX ix_edges_to ON #Edges (ToNodeID, FromNodeId)
    ;
    WITH    q (rn) AS
            (
            SELECT  1
            UNION ALL
            SELECT  rn + 1
            FROM    q
            WHERE   rn < 1000
            )
    INSERT
    INTO    #Edges (FromNodeId, ToNodeId)
    SELECT  q1.rn, q2.rn
    FROM    q q1
    CROSS JOIN
            q q2
    WHERE   (q1.rn + q2.rn) % 37 = 0
    OPTION (MAXRECURSION 0)
    

    For the UNION:

    SELECT  ToNodeId
    FROM    #Edges
    WHERE   FromNodeId = 3
    UNION
    SELECT  FromNodeId
    FROM    #Edges
    WHERE   ToNodeId = 3
    
    
      |--Stream Aggregate(GROUP BY:([Union1006]))
           |--Merge Join(Concatenation)
                |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
                |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)
    

    For the IN:

      |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN (3)=[tempdb].[dbo].[#Edges].[FromNodeID] THEN [tempdb].[dbo].[#Edges].[ToNodeID] ELSE [tempdb].[dbo].[#Edges].[FromNodeID] END))
           |--Sort(DISTINCT ORDER BY:([tempdb].[dbo].[#Edges].[EdgeID] ASC))
                |--Concatenation
                     |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[FromNodeID]=(3)) ORDERED FORWARD)
                     |--Index Seek(OBJECT:([tempdb].[dbo].[#Edges]), SEEK:([tempdb].[dbo].[#Edges].[ToNodeID]=(3)) ORDERED FORWARD)
    

    As you can see, the plans are essentially the same: they both take values from the corresponding indexes and concatenate the results.

    The UNION query is in fact a little more efficient, since it uses a Merge Join to concatenate the results, and the records come out of the merge join naturally ordered, so the Stream Aggregate does not need to sort.