In a table having 2 columns and 2 Records :
Record 1 : Column 1 - my_col value as: {"XXX": ["123","456"],"YYY": ["246","135"]}
and Column 2 - ID as A123
Record 2 : Column 1 - my_col value as: {"XXX": ["123","456"],"YYY": ["246","135"], "ZZZ":["333","444"]}
and Column 2 - ID as B222
Note: Column2 is not static , every record have different value.
Expectation :
Key | Value | ID |
---|---|---|
XXX | 123 | A123 |
XXX | 456 | A123 |
YYY | 246 | A123 |
YYY | 135 | A123 |
ZZZ | 333 | B222 |
ZZZ | 444 | B222 |
Query used :
SELECT
t.key,
kv.kval as value,
ID
FROM (
SELECT
explode(map(
'XXX',
split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
'YYY',
split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
))
FROM
input_df
) t LATERAL VIEW explode(t.value) kv as kval
As you recommended flattening is good. Facing issue In retrieve ID value from the table.
If your table contains some other columns and you need to select them along with values from explode, then use one more lateral view in the subquery. See comments in the code:
SELECT
t.key,
kv.kval as value,
t.ID
FROM (
SELECT
e.key, e.value, --columns from lateral view
t.id --column from table
FROM
input_df t --add alias
--move explode to lateral view in the FROM
lateral view explode(map(
'XXX',
split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
'YYY',
split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
)) e as key, value --add alias and col names
) t LATERAL VIEW explode(t.value) kv as kval