Search code examples
sqlpostgresqljsonb

Extract values from a list inside a nested json dictionary in PostgreSQL


I have a table in PostgreSQL which looks like:

+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | result                                                                                                                                                                                                              |
+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1  | {"condition": "OR", "duplicate_ids": {"book_name": {"value": "-", "duplicates": ["afec4e99", "4c86040b"]}, "author": {"value": "-", "duplicates": ["4c86040b", "362a883d", "7856b483", "7b091646", "5c99eea3", "0fa4f47f"]}}} |
| 1  | {"condition": "OR", "duplicate_ids": {"author": {"value": "some_name", "duplicates": ["69e139df"]}}}                                                                                                                |
| 2  | {"condition": "OR", "duplicate_ids": {"author": {"value": "other_name", "duplicates": ["7fa13aa8"]}}}                                                                                                               |
| 3  | {"condition": "OR", "duplicate_ids": {"publisher": {"value": "publisher_name", "duplicates": ["2b69af3d"]}}}                                                                                                        |
+----+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I am attempting to extract all unique values that are part of the duplicates list for id = 1. The challenge is that the key inside duplicate_ids will change frequently, meaning it has to be dynamic. In this example, I don't know beforehand that the keys inside duplicate_ids are book_name and author for id = 1.

So, the output should be:

+------------+
| duplicates |
+------------+
| afec4e99   |
| 4c86040b   |
| 362a883d   |
| 7856b483   |
| 7b091646   |
| 5c99eea3   |
| 0fa4f47f   |
| 69e139df   |
+------------+

Is there any way of achieving this using PostgreSQL query?


Solution

  • You can use jsonb_each() to extract all elements as key/value pairs. Then you can get the duplicates element :

    This is to get duplicates element as JSONB array :

    SELECT
        e.value -> 'duplicates'
    FROM mytable t
    CROSS JOIN jsonb_each(result->'duplicate_ids') e
    WHERE t.id = 1
    

    Results :

    ["4c86040b", "362a883d", "7856b483", "7b091646", "5c99eea3", "0fa4f47f"]
    ["afec4e99", "4c86040b"]
    ["69e139df"]
    

    This is to convert the JSONB array into individual rows by using the jsonb_array_elements_text() function :

    SELECT jsonb_array_elements_text(e.value -> 'duplicates') AS duplicates 
    FROM mytable t
    CROSS JOIN jsonb_each(result->'duplicate_ids') e 
    WHERE t.id = 1;
    

    Results :

    duplicates
    4c86040b
    362a883d
    7856b483
    7b091646
    5c99eea3
    0fa4f47f
    afec4e99
    4c86040b
    69e139df
    

    Demo here