How to generate ids and parent_ids from the arrays of categories. The number or depth of subcategories can be anything between 1-10 levels.
Example PostgreSQL column. Datatype character varying array.
data_column
character varying[] |
----------------------------------
[root_1, child_1, childchild_1] |
[root_1, child_1, childchild_2] |
[root_2, child_2] |
I would like to convert the column of arrays into the table as shown below that I assume is called the Adjacency List Model. I know there is also the Nested Tree Sets Model and Materialised Path model.
Final output table
id | title | parent_id
------------------------------
1 | root_1 | null
2 | root_2 | null
3 | child_1 | 1
4 | child_2 | 2
5 | childchild_1 | 3
6 | childchild_2 | 3
Final output tree hierarchy
root_1
--child_1
----childchild_1
----childchild_2
root_2
--child_2
You can do this with a recursive CTE
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t -- 1
UNION
SELECT data[idx], idx + 1, title, data -- 2
FROM cte
WHERE idx <= cardinality(data)
)
SELECT DISTINCT -- 3
title,
parent
FROM cte
DISTINCT
removes tied elements (e.g. two times the same root_1
).Now you have created the hierarchy. Now you need the ids.
You can generate them in many different ways, for example using the row_number()
window function:
WITH RECURSIVE cte AS (...)
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
Now, every row has its own id. The order criterion may be tweaked a little. Here we have only little chance to change this without any further information. But the algorithm stays the same.
With the ids of each column, we can create a self join to join the parent id by using the parent
title column. Because a self join is a repetition of the select query, it makes sense to encapsulate it into a second CTE to avoid code replication. The final result is:
WITH RECURSIVE cte AS
( SELECT data[1] as title, 2 as idx, null as parent, data FROM t
UNION
SELECT data[idx], idx + 1, title, data
FROM cte
WHERE idx <= cardinality(data)
), numbered AS (
SELECT
*,
row_number() OVER ()
FROM (
SELECT DISTINCT
title,
parent
FROM cte
) s
)
SELECT
n1.row_number as id,
n1.title,
n2.row_number as parent_id
FROM numbered n1
LEFT JOIN numbered n2 ON n1.parent = n2.title