Search code examples
sqlpostgresqlcommon-table-expressionrecursive-querydirected-graph

WITH RECURSIVE query to choose the longest paths


I am new to WITH RECURSIVE in PostgreSQL. I have a reasonably standard recursive query that is following an adjacency list. If I have, for example:

1 -> 2
2 -> 3
3 -> 4
3 -> 5
5 -> 6

it produces:

1
1,2
1,2,3
1,2,3,4
1,2,3,5
1,2,3,5,6

What I would like is to have just:

1,2,3,4
1,2,3,5,6

But I can't see how to do this in Postgres. This would seem to be "choose the longest paths" or "choose the paths that are not contained in another path". I can probably see how to do this with a join on itself, but that seems quite inefficient.

An example query is:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
   SELECT g.id, g.link, g.data, 1, ARRAY[g.id], false
   FROM graph g
  UNION ALL
   SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
   FROM graph g, search_graph sg
   WHERE g.id = sg.link AND NOT cycle
)
SELECT * FROM search_graph;

Solution

  • You already have a solution at your fingertips with cycle, just add a predicate at the end.

    But adjust your break condition by one level, currently you are appending one node too many:

    WITH RECURSIVE search AS (
       SELECT id, link, data, ARRAY[g.id] AS path, (link = id) AS cycle
       FROM   graph g
       WHERE  NOT EXISTS (SELECT FROM graph WHERE link = g.id)
    
       UNION ALL
       SELECT g.id, g.link, g.data, s.path || g.id, g.link = ANY(s.path)
       FROM   search s
       JOIN   graph g ON g.id = s.link
       WHERE  NOT s.cycle
       )
    SELECT *
    FROM   search
    WHERE cycle;
    -- WHERE cycle IS NOT FALSE;  -- alternative if link can be NULL
    
    • Also including a start condition like mentioned by @wildplasser.

    • Init condition for cycle is (link = id) to catch shortcut cycles. Not necessary if you have a CHECK constraint to disallow that in your table.

    • The exact implementation depends on the missing details.

    • This is assuming all graphs are terminated with a cycle or link IS NULL and there is a FK constraint from link to id in the same table. The exact implementation depends on missing details. If link is not actually a link (no referential integrity), you need to adapt ...