Search code examples
sqljsonamazon-athenaprestotrino

transform json in aws athena


I have a table called Game, with a history jsonb column. The json looks like this:

{"Games": {"key1": "value1", "key2": value2}}

(the values can be a string or a number). I want to transform it to a json of this format:

{"Games": "AllGames": [{"key1": ["value1"]}, {"key2": ["value2"}]]}

The original json has one value for each key, which transform to an array with one field in the new json, as a string, even if the original value was a number.

I have this working code in postgres, for cases when history is not null:

SELECT jsonb_build_object(
        'Games', jsonb_build_object(
                'AllGames', jsonb_agg(
                        jsonb_build_object(
                                key, json_build_array(CASE
                                                          WHEN value::text LIKE '"%"' THEN value
                                                          ELSE (concat('"',value::text,'"'))::jsonb
                            END)
                        )
                                )
               )
                     )
FROM
    (
        SELECT jsonb_each(history::jsonb->'Games').*
        FROM public.Game
    ) sub

But I'm not sure how to do it in Athena.

Tried to transform the json.


Solution

  • There are multiple ways to achieve that. In basis of almost all of them would be using ability to cast JSON to some type (for example JSON object -> map(varchar, json)) and back. For example using some concatenation (to remove some map creation):

    WITH dataset(json_str) AS
    (
       values ('{"Games": {"key1": "value1", "key2": 1}}')
    )
    
    SELECT '{"Games": { "AllGames": '
            || json_format(
              cast(
                 map_values(
                   transform_values(
                           cast(
                             json_extract(json_str, '$.Games') -- read the inner object as JSON type 
                               as map(varchar, json)
                           ) -- cast it to map
                       , (k,v) -> map(array[k], array[ array[v] ]) -- transform map values to another map
                   )
                 ) -- read array of transformed map values 
                  as json) -- cast it to JSON
               ) -- format JSON to string
            || '}}'
    FROM dataset;
    

    Output:

                                _col0                             
    --------------------------------------------------------------
     {"Games": { "AllGames": [{"key1":["value1"]},{"key2":[1]}]}}