Search code examples
sqljsonhivehqlcross-join

Hive: Extract Data From Nested JSON and Append


I have a hive table with IDs and JSON such as below:

id   json
----------
21 | {"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}
42 | {"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}

The desired output is such as below:

id   url
----------
21 | aaa.com
21 | bbb.com
42 | qqq.com
42 | vvv.com

Would anyone help with this hive query?

Applying explode() directly does not work as json column is a string.


Solution

  • Try with below query:

    hive> with cte as (
    select stack(2,int(21),string('{"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}'),
    int(42),string('{"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}')) as (id,json))  
    select id,url from (
    select id,
    split(
    regexp_replace(
            get_json_object(json,'$.list.url'),
      '(\\[|\\]|\")','')
     ,',')jsn from cte)t 
    lateral view explode(jsn)asd as url;
    

    Output:

    id      url
    21      aaa.com
    21      bbb.com
    42      qqq.com
    42      vvv.com
    

    Functions Explanation:

    Stack --is used to create sample data

    get_json_object -- to extract data from json string

    regexp_replace --to replace [,]," characters

    split --split on , and this will return array

    explode --use array column to explode