Search code examples
sqljsonpresto

Handling different json formats in Presto when extracting as string?


I am trying to extract a string from a JSON in presto. Here are the 2 options for how the extracted data can look before I cast to a string:

  1. JSON_EXTRACT(my_column, '$.my_column_extract') = ["A::1","B::2","C::3","D::4"]
  2. JSON_EXTRACT(my_column, '$.my_column_extract') = {"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}

For both above options, the goal string format I want to achieve: For 1) A::1, B::2, C::3, D::4 For 2) A::1, B::2, C::3, D::4,5

So I cast as ARRAY(VARCHAR) and then ARRAY_JOIN to get a string. This works for 1) but not for 2).

  1. ARRAY_JOIN( CAST( JSON_EXTRACT(my_column, '$.my_column_extract') AS ARRAY(VARCHAR) ), ', ' ) = A::1, B::2, C::3, D::4
  2. The same functions result in an error due the format. Error notice (INVALID_CAST_ARGUMENT) Cannot cast to array(varchar). Expected a json array, but got {"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}

How can I get 2) to result in the goal string format?


Solution

  • If your records are available only in those two formats then you can leverage try_cast function which returns null if the cast can't be performed and use coalesce two cast to the second format:

    -- sample data
    WITH dataset(my_column) AS (
     values ('{"my_column_extract":["A::1","B::2","C::3","D::4"]}'),
            ('{"my_column_extract":{"0":"A::1","1":"B::2","2":"C::3","3":"D::4,5"}}')
    ),
    
    -- query parts
    extracted as (
        select JSON_EXTRACT(my_column, '$.my_column_extract') extr
        from dataset)
    
    SELECT coalesce(
            try_cast(extr as array(varchar)), -- try process array
            map_values(cast(extr as map(varchar, varchar)))) -- if not array - process json object as map and get map values
    FROM extracted;
    

    Output:

    _col0
    [A::1, B::2, C::3, D::4]
    [A::1, B::2, C::3, D::4,5]

    And then process resulting arrays as needed