Search code examples
postgresqlsql-updatejsonb

Convert jsonb comma separated values into a json object using a psql script


I have a table in postgresql that has two columns:

               Table "schemaname.tablename"
 Column |       Type        | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
 _key   | character varying |           | not null |
 value  | jsonb             |           |          |
Indexes:
    "tablename_pkey" PRIMARY KEY, btree (_key)

and I'd like to convert a nested property value of the jsonb that looks like this:

{
    "somekey": "[k1=v1, k2=v2, k3=v2]",
}

into this:

{
    "somekey":  [
        "java.util.LinkedHashMap",
        {
            "k1": "v1",
            "k2": "v2",
            "k3": "v3"
        }
    ]
}

I've managed to parse the comma separted string into an array of strings but aside from having to still apply another split on '=' I don't really know how to do the actual UPDATE on all rows of the table and generate the proper jsonb value for "somekey" key.

select regexp_split_to_array(RTRIM(LTRIM(value->>'somekey','['),']'),',') from schemaname.tablename;

Any ideas?


Solution

  • Try this one (self-contained test data):

    WITH tablename (_key, value) AS (
        VALUES
            ('test', '{"somekey":"[k1=v1, k2=v2, k3=v2]"}'::jsonb),
            ('second', '{"somekey":"[no one=wants to, see=me, with garbage]"}'::jsonb),
            ('third', '{"somekey":"[some,key=with a = in it''s value, some=more here]"}'::jsonb)
        )
    SELECT
        tab._key,
        jsonb_insert(
            '{"somekey":["java.util.LinkedHashMap"]}', -- basic JSON structure
            '{somekey,0}', -- path to insert after
            jsonb_object( -- create a JSONB object on-the-fly from the key-value array
                array_agg(key_values) -- aggregate all key-value rows into one array
            ),
            true -- we want to insert after the matching element, not before it
        ) AS json_transformed
    FROM
        tablename AS tab,
        -- the following is an implicit LATERAL join (function based on eahc row for previous table)
        regexp_matches( -- produces multiple rows
            btrim(tab.value->>'somekey', '[]'), -- as you started with
            '(\w[^=]*)=([^,]*)', -- define regular expression groups for keys and values
            'g' -- we want all key-value sets
        ) AS key_values
    GROUP BY 1
    ;
    

    ...resulting in:

      _key  |                                           json_transformed                                            
    --------+-------------------------------------------------------------------------------------------------------
     second | {"somekey": ["java.util.LinkedHashMap", {"see": "me", "no one": "wants to"}]}
     third  | {"somekey": ["java.util.LinkedHashMap", {"some": "more here", "some,key": "with a = in it's value"}]}
     test   | {"somekey": ["java.util.LinkedHashMap", {"k1": "v1", "k2": "v2", "k3": "v2"}]}
    (3 rows)
    

    I hope the inline comments explain how it works in enough detail.

    Without requiring aggregate/group by:

    The following requires no grouping as we don't need aggregate function array_agg, but are a little bit less strict on the key-value format and will break a query easily because of some data (the previous variant will just drop some key-value):

    WITH tablename (_key, value) AS (
        VALUES
            ('test', '{"somekey":"[k1=v1, k2=v2, k3=v2]"}'::jsonb),
            ('second', '{"somekey":"[no one=wants to, see=me, with garbage]"}'::jsonb)
        )
    SELECT
        _key,
        jsonb_insert(
            '{"somekey":["java.util.LinkedHashMap"]}', -- basic JSON structure
            '{somekey,0}', -- path to insert after
            jsonb_object( -- create a JSONB object on-the-fly from the key-value array
                key_values -- take the keys + values as split using the function
            ),
            true -- we want to insert after the matching element, not before it
        ) AS json_transformed
    FROM
        tablename AS tab,
        -- the following is an implicit LATERAL join (function based on eahc row for previous table)
        regexp_split_to_array( -- produces an array or keys and values: [k, v, k, v, ...]
            btrim(tab.value->>'somekey', '[]'), -- as you started with
            '(=|,\s*)' -- regex to match both separators
        ) AS key_values
    ;
    

    ...results into:

      _key  |                                json_transformed                                
    --------+--------------------------------------------------------------------------------
     test   | {"somekey": ["java.util.LinkedHashMap", {"k1": "v1", "k2": "v2", "k3": "v2"}]}
     second | {"somekey": ["java.util.LinkedHashMap", {"see": "me", "no one": "wants to"}]}
    (2 rows)
    

    Feeding it with garbage (as in the "second" row before) or with an = character in the value (as in the "third" row before) would result in the following error here:

    ERROR:  array must have even number of elements