sql node.js postgresql tree hierarchical-data

Creating a category tree table from an array of categories in PostgreSQL

How to generate ids and parent_ids from the arrays of categories. The number or depth of subcategories can be anything between 1-10 levels.

Example PostgreSQL column. Datatype character varying array.

data_column
character varying[]             |               
----------------------------------
[root_1, child_1, childchild_1] |
[root_1, child_1, childchild_2] | 
[root_2, child_2]               |

I would like to convert the column of arrays into the table as shown below that I assume is called the Adjacency List Model. I know there is also the Nested Tree Sets Model and Materialised Path model.

Final output table

id | title        | parent_id
------------------------------
1  | root_1       | null
2  | root_2       | null  
3  | child_1      | 1
4  | child_2      | 2 
5  | childchild_1 | 3  
6  | childchild_2 | 3

Final output tree hierarchy

root_1
--child_1
----childchild_1
----childchild_2
root_2
--child_2

Solution

step-by-step demo: db<>fiddle

You can do this with a recursive CTE

WITH RECURSIVE cte AS 
(  SELECT data[1] as title, 2 as idx, null as parent, data FROM t   -- 1

   UNION

   SELECT data[idx], idx + 1, title, data                           -- 2
   FROM cte
   WHERE idx <= cardinality(data)
)
SELECT DISTINCT                                                     -- 3 
    title,
    parent
FROM cte

The starting query of the recursion: Get all root elements and data you'll need within the recursion
The recursive part: Get element of new index and increase the index
After recursion: Query the columns you finally need. The DISTINCT removes tied elements (e.g. two times the same root_1).

Now you have created the hierarchy. Now you need the ids. You can generate them in many different ways, for example using the row_number() window function:

WITH RECURSIVE cte AS (...)
SELECT 
    *,
    row_number() OVER ()
FROM (
    SELECT DISTINCT
        title,
        parent
    FROM cte
) s

Now, every row has its own id. The order criterion may be tweaked a little. Here we have only little chance to change this without any further information. But the algorithm stays the same.

With the ids of each column, we can create a self join to join the parent id by using the parent title column. Because a self join is a repetition of the select query, it makes sense to encapsulate it into a second CTE to avoid code replication. The final result is:

WITH RECURSIVE cte AS 
(  SELECT data[1] as title, 2 as idx, null as parent, data FROM t

   UNION

   SELECT data[idx], idx + 1, title, data
   FROM cte
   WHERE idx <=  cardinality(data)
), numbered AS (
   SELECT 
       *,
       row_number() OVER ()
   FROM (
       SELECT DISTINCT
           title,
           parent
       FROM cte
   ) s
)
SELECT 
    n1.row_number as id,
    n1.title,
    n2.row_number as parent_id
FROM numbered n1
LEFT JOIN numbered n2 ON n1.parent = n2.title