Search code examples
sqlpostgresqlcommon-table-expression

Postgresql recursively parsing nested JSON


I'm trying to parse a nested JSON in Postgres, which has the following structure:

{
    "id": 0,
    "children": [{
            "id": 965,
            "children": [{
                    "id": 967
                },
                {
                    "id": 969
                },
                {
                    "id": 971
                },
                {
                    "id": 973
                }
            ]
        },
        {
            "id": 974,
            "children": [{
                    "id": 976
                },
                {
                    "id": 978
                }
            ]
        }
    ]
}  

My desired output form is this:

-------------------------------------------
|parent_id          |child_id             |
-------------------------------------------
|0                  |965                  |
|0                  |974                  |
|965                |967                  |
|965                |969                  |
|965                |971                  |
|965                |973                  |
|974                |976                  |
|974                |978                  |
-------------------------------------------  

I am trying to use RECURSIVE CTEs to do so, however, with whatever I've tried, I run into either could not identify an equality operator for type json or ERROR: set-returning functions are not allowed in CASE. I'm able to get the data for one level, but that's far from what I intend to do, so any suggestions as to what I could do differently would be of great help.

Here is a how you can quickly try it out if needed, along with all what've I've tried and the errors those raised. Also it'd be a bonus if your suggestions work for json and jsonb both.

Thanks

Edit: The number of levels of the JSON isn't fixed, it could be upto any level (max 3-4).
Edit: Adding queries I tried here itself:

with recursive cte(id,json_element) as (
select tree->'id',tree->'children' from json_test
union
select json_element->'id',json_element->'children' from cte
) select * from cte;
--could not identify an equality operator for type json

WITH RECURSIVE cte(id, children) AS (
  SELECT
    tree->'id' as id,
    tree->'children' as children
  FROM json_test
  UNION ALL
  SELECT
    children -> 'id',
    children -> 'children'
  FROM cte,
    --json_each(CASE WHEN json_typeof(cte.children) <> 'object' THEN '{}' :: JSON ELSE cte.children END) AS t
    json_each(CASE WHEN json_typeof(cte.children) = 'array' THEN json_array_elements(cte.children) ELSE cte.children END) AS t
)
SELECT * FROM cte WHERE json_typeof(cte.children) <> 'object';
 --ERROR: set-returning functions are not allowed in CASE
  --Hint: You might be able to move the set-returning function into a LATERAL FROM item.  

I also tried multiple changes to these queries but ended up with the same errors in the end.


Solution

  • demo: db<>fiddle

    WITH RECURSIVE jsondata AS (
        SELECT 
            data ->> 'id' as parent_id,
            data -> 'children' as children 
        FROM (
            SELECT '{"id": 0, "children": [{"id": 965, "children": [{"id": 967}, {"id": 969}, {"id": 971}, {"id": 973}]}, {"id": 974, "children": [{"id": 976}, {"id": 978}]}]}'::jsonb as data
        ) s
    
        UNION
    
        SELECT
            value ->> 'id',
            value -> 'children'
        FROM jsondata, jsonb_array_elements(jsondata.children)
    
    )
    SELECT 
        parent_id, 
        jsonb_array_elements(children) ->> 'id' 
    FROM jsondata 
    WHERE children IS NOT NULL
    

    This is straight forward: Get the id of the current element as parent_id. Give the children array as new json object.

    In the recursion part you have to expand the children array into one row per child. Not you can get the id and children as well.

    Because this is done until the last child node (which has no own children per definition) their children columns are NULL. So these rows can be filtered out. The last step is to expand the children json objects and read their id elements to list them in the way you expect.

    The columns are now of type text. Of course you could to an integer cast after all:

    parent_id::int, 
    (jsonb_array_elements(children) ->> 'id')::int