Search code examples
sqlarraysjsonprestotrino

Athena/Presto: unnesting list of objects from serialized JSON string


My Athena/Presto database has a serialized JSON object, which I am having a difficulty to handle.
Here is an example of the data:

WITH input AS (
    SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
)

SELECT typeof(x),
    x

FROM
    (input) AS t(x)
    CROSS JOIN UNNEST (CAST(x AS ARRAY<JSON>)) AS u(y)
    

the error message says:

INVALID_CAST_ARGUMENT: Cannot cast to array(json). Expected a json array, but got [{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}] "[{"is_flag": 0, "url": "https://app.com/123"},{"is_flag": 1, "url": "https://app.com/456"}]"

While what I eventually interested in is a table as below:


Solution

  • Your json is actually double encoded one (i.e. it is a json string containing encoded json), so you need to double decode it with second json_parse (note that I use succinct syntax for unnest):

    WITH input AS (
        SELECT JSON_PARSE('"[{\"is_flag\": 0, \"url\": \"https://app.com/123\"},{\"is_flag\": 1, \"url\": \"https://app.com/456\"}]"') AS json_obj
    )
    
    -- query
    SELECT json_extract_scalar(obj, '$.is_flag') is_flag,
           json_extract_scalar(obj, '$.url') url
    FROM input,
    unnest(cast(json_parse(cast(json_obj as varchar)) as array(json))) as t(obj);
    

    Output: