Search code examples
sqlarraysjsonhivehiveql

Json - Additional column value added to Flatten Key and Values in Hive


In a table having 2 columns and 2 Records :

Record 1 : Column 1 - my_col value as: {"XXX": ["123","456"],"YYY": ["246","135"]} and Column 2 - ID as A123

Record 2 : Column 1 - my_col value as: {"XXX": ["123","456"],"YYY": ["246","135"], "ZZZ":["333","444"]} and Column 2 - ID as B222

Note: Column2 is not static , every record have different value.

Expectation :

Key Value ID
XXX 123 A123
XXX 456 A123
YYY 246 A123
YYY 135 A123
ZZZ 333 B222
ZZZ 444 B222

Query used :

SELECT
    t.key,
    kv.kval as value,
    ID
FROM (
    SELECT
         explode(map(
             'XXX',
             split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
             'YYY',
             split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
         )) 
    FROM
        input_df
) t LATERAL VIEW explode(t.value) kv as kval

As you recommended flattening is good. Facing issue In retrieve ID value from the table.


Solution

  • If your table contains some other columns and you need to select them along with values from explode, then use one more lateral view in the subquery. See comments in the code:

    SELECT
        t.key,
        kv.kval as value,
        t.ID
    FROM (
    SELECT 
     e.key, e.value, --columns from lateral view
     t.id  --column from table  
    FROM
        input_df t --add alias
        --move explode to lateral view in the FROM
        lateral view explode(map(
             'XXX',
             split(regexp_replace(get_json_object(my_col,'$.XXX'),'"|\\[|\\]',''),','),
             'YYY',
             split(regexp_replace(get_json_object(my_col,'$.YYY'),'"|\\[|\\]',''),',')
         )) e as key, value --add alias and col names 
    ) t LATERAL VIEW explode(t.value) kv as kval