I have a hive table with IDs and JSON such as below:
id json
----------
21 | {"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}
42 | {"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}
The desired output is such as below:
id url
----------
21 | aaa.com
21 | bbb.com
42 | qqq.com
42 | vvv.com
Would anyone help with this hive query?
Applying explode() directly does not work as json column is a string.
Try with below query:
hive> with cte as (
select stack(2,int(21),string('{"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]}'),
int(42),string('{"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]}')) as (id,json))
select id,url from (
select id,
split(
regexp_replace(
get_json_object(json,'$.list.url'),
'(\\[|\\]|\")','')
,',')jsn from cte)t
lateral view explode(jsn)asd as url;
Output:
id url
21 aaa.com
21 bbb.com
42 qqq.com
42 vvv.com
Functions Explanation:
Stack
--is used to create sample data
get_json_object
-- to extract data from json string
regexp_replace
--to replace [,],"
characters
split
--split on ,
and this will return array
explode
--use array column to explode