Search code examples
sqlpostgresqlcommon-table-expressionrecursive-querytransitive-closure-table

Recursive query used for transitive closure


I've created a simple example to illustrate transitive closure using recursive queries in PostgreSQL.

However, something is off with my recursive query. I'm not familiar with the syntax, yet, so this request may be entirely noobish of me. If you run the query, you will see that node 1 repeats itself in the path results. How to fix it?

             1
           /   \
          2     3
         / \   /
        4  5  6
       /
      7
     / \
    8   9
create table account(
acct_id INT,
parent_id INT REFERENCES account(acct_id),
acct_name VARCHAR(100),
PRIMARY KEY(acct_id)
);

insert into account (acct_id, parent_id, acct_name) values (1,1,'account 1');
insert into account (acct_id, parent_id, acct_name) values (2,1,'account 2');
insert into account (acct_id, parent_id, acct_name) values (3,1,'account 3');
insert into account (acct_id, parent_id, acct_name) values (4,2,'account 4');
insert into account (acct_id, parent_id, acct_name) values (5,2,'account 5');
insert into account (acct_id, parent_id, acct_name) values (6,3,'account 6');
insert into account (acct_id, parent_id, acct_name) values (7,4,'account 7');
insert into account (acct_id, parent_id, acct_name) values (8,7,'account 8');
insert into account (acct_id, parent_id, acct_name) values (9,7,'account 9');

WITH RECURSIVE search_graph(acct_id, parent_id, depth, path, cycle) AS (
        SELECT g.acct_id, g.parent_id, 1,
          ARRAY[g.acct_id],
          false
        FROM account g
      UNION ALL
        SELECT g.acct_id, g.parent_id, sg.depth + 1,
          path || g.acct_id,
          g.acct_id = ANY(path)
        FROM account g, search_graph sg
        WHERE g.acct_id = sg.parent_id AND NOT cycle
)
SELECT path[1] as Child,parent_id as Parent,path || parent_id as path FROM search_graph
ORDER BY path[1],depth;

Solution

  • You can simplify (assuming acct_id and parent_id are NOT NULL):

    WITH RECURSIVE search_graph AS (
       SELECT parent_id, ARRAY[acct_id] AS path
       FROM   account
    
       UNION  ALL
       SELECT g.parent_id, sg.path || g.acct_id
       FROM   search_graph sg
       JOIN   account g ON g.acct_id = sg.parent_id 
       WHERE  g.acct_id <> ALL(sg.path)
       )
    SELECT path[1] AS child
         , path[array_upper(path,1)] AS parent
         , path
    FROM   search_graph
    ORDER  BY path;
    

    The columns acct_id, depth, cycle are just noise in your query.

    The WHERE condition has to stop recursion one step earlier, before the duplicate entry from the top node is in the result. That was an "off-by-one" in your original.

    The rest is formatting.

    If you know the only possible circle in your graph is a self-reference, we can have that cheaper:

    WITH RECURSIVE search_graph AS (
       SELECT parent_id, ARRAY[acct_id] AS path, acct_id <> parent_id AS keep_going
       FROM   account
    
       UNION  ALL
       SELECT g.parent_id, sg.path || g.acct_id, g.acct_id <> g.parent_id AS keep_going
       FROM   search_graph sg
       JOIN   account g ON g.acct_id = sg.parent_id 
       WHERE  keep_going
       )
    SELECT path[1] AS child
         , path[array_upper(path,1)] AS parent
         , path
    FROM   search_graph
    ORDER  BY path;
    

    fiddle
    Old sqlfiddle