Search code examples
sqlarrayshivehiveqlexplode

Hive Explode the Array of Struct key: value:


This is the below Hive Table

CREATE EXTERNAL TABLE IF NOT EXISTS SampleTable
(
USER_ID string,
DETAIL_DATA array<struct<key:string,value:string>>
)

And this is the data in the above table-

11111    [{"key":"client_status","value":"ACTIVE"},{"key":"name","value":"Jane Doe"}]

Is there any way I can get the below output using HiveQL?

  **client_status**  |   **name**
  -------------------+----------------
         ACTIVE         Jane Doe  

I tried use explode() but I get result like that:

SELECT details
FROM sample_table
lateral view explode(DETAIL_DATA) exploded_table as details;


               **details** 
-------------------------------------------+
   {"key":"client_status","value":"ACTIVE"}        
 ------------------------------------------+
     {"key":"name","value":"Jane Doe"}              

Solution

  • Use laterral view [outer] inline to get struct elements already etracted and use conditional aggregation to get values corresponting to some keys grouped in single row, use group_by user_id.

    Demo:

    with sample_table as (--This is your data example
    select '11111' USER_ID,
    array(named_struct('key','client_status','value','ACTIVE'),named_struct('key','name','value','Jane Doe')) DETAIL_DATA
    )
    
    SELECT max(case when e.key='name' then e.value end) as name, 
           max(case when e.key='client_status' then e.value end) as status
    FROM sample_table
    lateral view inline(DETAIL_DATA) e as key, value
    group by USER_ID
    

    Result:

        name    status  
    ------------------------
    Jane Doe    ACTIVE
    

    If you can guarantee the order of structs in array (one with status comes first always), you can address nested elements dirctly

    SELECT detail_data[0].value as client_status,
           detail_data[1].value as name
     from sample_table 
    

    One more approach, if you do not know the order in array, but array is of size=2, CASE expressions without explode will give better performance:

    SELECT case when DETAIL_DATA[0].key='name' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as name, 
           case when DETAIL_DATA[0].key='client_status' then DETAIL_DATA[0].value else  DETAIL_DATA[1].value end as status
    FROM sample_table