I have a column in athena whose datatype is string and data looks like:
[{'main_key1': {'key1': value, 'key2': value2}, 'outer_key': outer_value},{'main_key2': {'key1': value, 'key2': value2}, 'outer_key': outer_value1}]
I have to split the value of this list (outer_key1) across multiple rows.
for eg:
id | outer_key |
---|---|
1 | outer_value |
1 | outer_value1 |
Can anyone please help if there is a way to acheive this
I already tried json_parse, unnest but everything is ending in some kind of exception
It depends on what your data actually is. If actually is a valid JSON string then you can process it as JSON:
-- sample data
with dataset(some_json) as(
values ('[{"main_key1": {"key1": 1, "key2": "value2"}, "outer_key": "outer_value"},{"main_key2": {"key1": 1, "key2": 2}, "outer_key": "outer_value1"}]')
)
-- query
select json_extract(js, '$.outer_key')
from dataset,
unnest(cast(json_parse(some_json) as array(json))) as t(js);
Output:
_col0
----------------
"outer_value"
"outer_value1"
If it is just a string then you will need to use regular expressions. Something to get you started (the regular expression can change based on actual data):
with dataset(some_str) as(
values ('[{''main_key1'': {''key1'': value, ''key2'': value2}, ''outer_key'': outer_value},{''main_key2'': {''key1'': value, ''key2'': value2}, ''outer_key'': outer_value1}]')
)
select val
from dataset,
unnest(regexp_extract_all(some_str, '''outer_key''\s?:\s?\b(\w*)\b', 1)) as t(val);
Output:
val
--------------
outer_value
outer_value1