Search code examples
sqlpostgresqlrecursionrecursive-query

"Backwards recursive" query in PostgreSQL?


My question here is somewhat related to my (old and successfully answered) question here: Recursive count query in PostgreSQL. Now, I am facing a somewhat similar problem, which I am actually not really sure if there is an answer to it at all. So, as last time, I hope you guys could save me! :-)

The new task:

I need to determine the root entity ID of a posted comment. As in my previous question, the table structure is identically. But now, I am confronted with a comment ID (cid) to which I have to determine the entity ID of the node it belongs to.

The most simple case is that the comment ID given holds already the node ID in it's table row, so I could determine the root entity ID by just looking at the information in plain sight.

It gets complicated, when the comment ID provided belongs to a comment, that has been posted as a reply to another comment. Again, the most simple case would be, that the referred comment already is a "level 1" comment and contains the root entity ID. But it gets really messy since comments can be posted to either level of the "comment tree".

The table structure:

cid entity_type entity_id comment
1 node 1 initial comment on node with id 1
2 comment 1 reply to comment with id 1
3 comment 2 reply to comment with id 2
4 comment 1 second reply on first comment
5 node 2 comment on another node
6 node 1 second direct child comment on node 1
7 node 3 comment on a third node

The nodes themselves are not part of this table. The visualized hierarchical structure of the data above is

node with id 1
 |- (cid: 1) initial comment on node with id 1
 |   |- (cid: 2) reply to comment with id 1
 |   |   |- (cid: 3) reply to comment with id 2
 |   |- (cid: 4) second reply on first comment
 |- (cid: 6) second direct child comment on node 1
node with id 2
 |- (cid: 5) comment on another node
node with id 3
 |- (cid: 7) comment on a third node
 

So, for example, I need to determine the node ID for the provided cid "3". As the table in my last question shows, the comment with ID 3 is an answer to the comment with ID 2, which itself is ans answer to the comment with ID 1, which holds the root entity ID of "1" in it's row, since this comment was posted on a node, not a comment. The result for this example would then be "1".

I again think that this is something that could possibly be solved by a recursive comment, but "bubbling up the tree" instead of starting at the root. But this time, I am not even able to find a starting point on how to phrase my query.

Any help would be greatly appreciated! Thanks in advance for taking the time to read my problem description!


Solution

  • Start at the target row and keep crawling all non-'node's. Recursion will stop once it hits the cte.entity_type='node' and in the outer query, you ask for exactly that one:
    demo at db<>fiddle

    create function get_foo_root_node(int)
    returns setof foo stable parallel safe as
    $f$ with recursive cte as(
          select foo.* 
          from foo
          where cid = $1
        
          union all
        
          select foo.* 
          from foo 
          join cte 
            on cte.entity_id = foo.cid
          where cte.entity_type <> 'node'
        )
        select cte.* 
        from cte
        where entity_type = 'node';
    $f$ language sql;
    
    select foo.*
          ,root.entity_id as root_entity_id
          ,root.comment as root_comment
    from foo 
    cross join lateral get_foo_root_node(foo.cid) as root;
    
    cid entity_type entity_id comment root_entity_id root_comment
    1 node 1 initial comment on node with id 1 1 initial comment on node with id 1
    2 comment 1 reply to comment with id 1 1 initial comment on node with id 1
    3 comment 2 reply to comment with id 2 1 initial comment on node with id 1
    4 comment 1 second reply on first comment 1 initial comment on node with id 1
    5 node 2 comment on another node 2 comment on another node
    6 node 1 second direct child comment on node 1 1 second direct child comment on node 1
    7 node 3 comment on a third node 3 comment on a third node

    The function above gets the root node of an individual row and the test is only supposed to show it works fine for them all.

    If your goal was to actually get that for all rows and return them all with their root node, you can do it directly:

    with recursive cte as(
          select foo.*
                ,cid as root_cid
                ,entity_type as root_entity_type
                ,entity_id as root_entity_id
                ,comment as root_comment
          from foo
        
          union all
        
          select cte.cid
                ,cte.entity_type
                ,cte.entity_id
                ,cte.comment
                ,foo.* 
          from foo 
          join cte 
            on cte.root_entity_id = foo.cid
          where cte.root_entity_type <> 'node'
    )
    select *
    from cte
    where root_entity_type = 'node'
    order by cid;