sql amazon-web-services amazon-athena presto trino

Athena unnest for nested Array string column

I have a column in athena whose datatype is string and data looks like:

[{'main_key1': {'key1': value, 'key2': value2}, 'outer_key': outer_value},{'main_key2': {'key1': value, 'key2': value2}, 'outer_key': outer_value1}]

I have to split the value of this list (outer_key1) across multiple rows.

for eg:

id	outer_key
1	outer_value
1	outer_value1

Can anyone please help if there is a way to acheive this

I already tried json_parse, unnest but everything is ending in some kind of exception

Solution

It depends on what your data actually is. If actually is a valid JSON string then you can process it as JSON:

-- sample data
with dataset(some_json) as(
    values ('[{"main_key1": {"key1": 1, "key2": "value2"}, "outer_key": "outer_value"},{"main_key2": {"key1": 1, "key2": 2}, "outer_key": "outer_value1"}]')
)

-- query
select json_extract(js, '$.outer_key')
from dataset,
     unnest(cast(json_parse(some_json) as array(json))) as t(js);

Output:

     _col0
----------------
 "outer_value"
 "outer_value1"

If it is just a string then you will need to use regular expressions. Something to get you started (the regular expression can change based on actual data):

with dataset(some_str) as(
    values ('[{''main_key1'': {''key1'': value, ''key2'': value2}, ''outer_key'': outer_value},{''main_key2'': {''key1'': value, ''key2'': value2}, ''outer_key'': outer_value1}]')
)

select val
from dataset,
     unnest(regexp_extract_all(some_str, '''outer_key''\s?:\s?\b(\w*)\b', 1)) as t(val);

Output:

     val
--------------
 outer_value
 outer_value1