Search code examples
postgresqlrecursive-query

About Recursive SQL queries, their performance and Infinite recursion


I want to construct a "recursive" query to use on my PostgreSQL database, the idea is as basic as you would imagine :)

SELECT sourceid, destinationid   FROM trail.log
WHERE sourceid = 'T0'
OR sourceid in (SELECT destinationid FROM trail.log where sourceid ='T0')
OR sourceid in (SELECT destinationid FROM trail.log where sourceid in ( you see where I want to go ... )
OR ...

According to the internet, here is what I should do:

WITH cte_traillog AS (
    SELECT       
        sourceid, destinationid       
    FROM       
        trail.log
    WHERE sourceid = 'T0'
    UNION ALL
    SELECT 
        e.sourceid, e.destinationid
    FROM 
        trail.log e
        INNER JOIN cte_org o 
            ON o.destinationid = e.sourceid
)
SELECT * FROM cte_traillog;

Knowing the the first query replies in less than a minute on my server, will the second one have the same performance? (here below some probably stupid questions) If my first query is not causing much trouble, is there a risque that the second will crash the server? What if the data will cause an infinite loop? Is there a way to prevent infinite loops? More generally is it this the correct approach?

Thanks in advance for your time.

Have a good day.


Solution

  • There is a mistake in your query: you used a wrong name when referencing the CTE in the recursive branch. Other than that, your query looks fine.

    The run time of your query will depend on how deep the hierarchy is, that is, how often the "recursive" (really: iterative) part is executed. An index can make it very fast.

    If the hierarchy contains cycles, the recursion will never stop, and you will eventually get a stack overflow error. To prevent that, you can use UNION instead of UNION ALL, which will eliminate duplicates.