Search code examples
sqlnode.jspostgresqltreehierarchical-data

Creating a category tree table from an array of categories in PostgreSQL


How to generate ids and parent_ids from the arrays of categories. The number or depth of subcategories can be anything between 1-10 levels.

Example PostgreSQL column. Datatype character varying array.

data_column
character varying[]             |               
----------------------------------
[root_1, child_1, childchild_1] |
[root_1, child_1, childchild_2] | 
[root_2, child_2]               | 

I would like to convert the column of arrays into the table as shown below that I assume is called the Adjacency List Model. I know there is also the Nested Tree Sets Model and Materialised Path model.

Final output table

id | title        | parent_id
------------------------------
1  | root_1       | null
2  | root_2       | null  
3  | child_1      | 1
4  | child_2      | 2 
5  | childchild_1 | 3  
6  | childchild_2 | 3   

Final output tree hierarchy

root_1
--child_1
----childchild_1
----childchild_2
root_2
--child_2

Solution

  • step-by-step demo: db<>fiddle

    You can do this with a recursive CTE

    WITH RECURSIVE cte AS 
    (  SELECT data[1] as title, 2 as idx, null as parent, data FROM t   -- 1
    
       UNION
    
       SELECT data[idx], idx + 1, title, data                           -- 2
       FROM cte
       WHERE idx <= cardinality(data)
    )
    SELECT DISTINCT                                                     -- 3 
        title,
        parent
    FROM cte
    
    1. The starting query of the recursion: Get all root elements and data you'll need within the recursion
    2. The recursive part: Get element of new index and increase the index
    3. After recursion: Query the columns you finally need. The DISTINCT removes tied elements (e.g. two times the same root_1).

    Now you have created the hierarchy. Now you need the ids. You can generate them in many different ways, for example using the row_number() window function:

    WITH RECURSIVE cte AS (...)
    SELECT 
        *,
        row_number() OVER ()
    FROM (
        SELECT DISTINCT
            title,
            parent
        FROM cte
    ) s
    

    Now, every row has its own id. The order criterion may be tweaked a little. Here we have only little chance to change this without any further information. But the algorithm stays the same.

    With the ids of each column, we can create a self join to join the parent id by using the parent title column. Because a self join is a repetition of the select query, it makes sense to encapsulate it into a second CTE to avoid code replication. The final result is:

    WITH RECURSIVE cte AS 
    (  SELECT data[1] as title, 2 as idx, null as parent, data FROM t
    
       UNION
    
       SELECT data[idx], idx + 1, title, data
       FROM cte
       WHERE idx <=  cardinality(data)
    ), numbered AS (
       SELECT 
           *,
           row_number() OVER ()
       FROM (
           SELECT DISTINCT
               title,
               parent
           FROM cte
       ) s
    )
    SELECT 
        n1.row_number as id,
        n1.title,
        n2.row_number as parent_id
    FROM numbered n1
    LEFT JOIN numbered n2 ON n1.parent = n2.title