Search code examples
sqlpostgresqltreehierarchical-data

SQL Find all direct descendants in a tree


I have a tree in my database that is stored using parent id links.

A sample of what I have for data in the table is:

id | name        | parent id
---+-------------+-----------
0  | root        | NULL
1  | Node 1      | 0
2  | Node 2      | 0
3  | Node 1.1    | 1
4  | Node 1.1.1  | 3
5  | Node 1.1.2  | 3

Now I would like to get a list of all the direct descendants of a given node but if none exist I would like to have it just return the node itself.

I want the return for the query for children of id = 3 to be:

children
--------
4
5

Then the query for the children of id = 4 to be:

children
--------
4

I can change the way I am storing the tree to a nested set but I don't see how that would make the query I want possible.


Solution

  • In new PostgreSQL 8.4 you can do it with a CTE:

    WITH RECURSIVE q AS
            (
            SELECT  h, 1 AS level, ARRAY[id] AS breadcrumb
            FROM    t_hierarchy h
            WHERE   parent = 0
            UNION ALL
            SELECT  hi, q.level + 1 AS level, breadcrumb || id
            FROM    q
            JOIN    t_hierarchy hi
            ON      hi.parent = (q.h).id
            )
    SELECT  REPEAT('  ', level) || (q.h).id,
            (q.h).parent,
            (q.h).value,
            level,
            breadcrumb::VARCHAR AS path
    FROM    q
    ORDER BY
            breadcrumb
    

    See this article in my blog for details:

    In 8.3 or earlier, you'll have to write a function:

    CREATE TYPE tp_hierarchy AS (node t_hierarchy, level INT);
    
    CREATE OR REPLACE FUNCTION fn_hierarchy_connect_by(INT, INT)
    RETURNS SETOF tp_hierarchy
    AS
    $$
            SELECT  CASE
                    WHEN node = 1 THEN
                            (t_hierarchy, $2)::tp_hierarchy
                    ELSE
                            fn_hierarchy_connect_by((q.t_hierarchy).id, $2 + 1)
                    END
            FROM    (
                    SELECT  t_hierarchy, node
                    FROM    (
                            SELECT  1 AS node
                            UNION ALL
                            SELECT  2
                            ) nodes,
                            t_hierarchy
                    WHERE   parent = $1
                    ORDER BY
                            id, node
                    ) q;
    $$
    LANGUAGE 'sql';
    

    and select from this function:

    SELECT  *
    FROM    fn_hierarchy_connect_by(4, 1)
    

    The first parameter is the root id, the second should be 1.

    See this article in my blog for more detail:

    Update:

    To show only the first level children, or the node itself if the children do not exist, issue this query:

    SELECT  *
    FROM    t_hierarchy
    WHERE   parent = @start
    UNION ALL
    SELECT  *
    FROM    t_hierarchy
    WHERE   id = @start
            AND NOT EXISTS
            (
            SELECT  NULL
            FROM    t_hierarchy
            WHERE   parent = @start
            )
    

    This is more efficient than a JOIN, since the second query will take but two index scans at most: the first one to make sure to find out if a child exists, the second one to select the parent row if no children exist.