Search code examples
sqlprestotrino

splitting a dict-like varchar column into multiple columns using SQL presto


I have a column in a table that is varchar but has a dictionary-like format. Some rows have more key-value pairs (for example first row has 3 pairs and second row has 4 pairs). For example:

column
{"customerid":"12345","name":"John", "likes":"Football, Running"}
{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}

I need a query that can "explode" the column like so:

customerid name likes dislikes
12345 John Football, Running
54321 Sam Art Hiking

No extra rows are added. Just extra columns (There are other already existing columns in the table).

I've tried casting the varchar column to an array and then using UNNEST function but it doesn't work. I think that method creates extra rows.

I am using Prestosql.


Solution

  • Your data looks like json, so you can parse and process it:

    -- sample data
    WITH dataset (column) AS (
        VALUES ('{"customerid":"12345","name":"John", "likes":"Football, Running"}' ),
            ('{"customerid":"54321","name":"Sam", "likes":"Art", "dislikes":"Hiking"}')
    ) 
    
    --query
    select json_extract_scalar(json_parse(column), '$.customerid') customerid,
        json_extract_scalar(json_parse(column), '$.name') name,
        json_extract_scalar(json_parse(column), '$.likes') likes,
        json_extract_scalar(json_parse(column), '$.dislikes') dislikes
    from dataset
    

    Output:

    customerid name likes dislikes
    12345 John Football, Running
    54321 Sam Art Hiking

    In case of many columns you can prettify it by casting to parsed json to map (depended on contents it can be map(varchar, varchar) or map(varchar, json)):

    --query
    select m['customerid'] customerid,
        m['name'] name,
        m['likes'] likes,
        m['dislikes'] dislikes
    from (
        select cast(json_parse(column) as map(varchar, varchar)) m
        from dataset
    )