So I have a sample of Hive data:
Customer | xx_var | yy_var | branchflow |
---|---|---|---|
{"customer_no":"239230293892839892","acct":["2324325","23425345"]} | 23 | 3 | [{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}] |
And I want to transform it into something like this:
Customer_no | acct | xx_var | yy_var | branchflow |
---|---|---|---|---|
239230293892839892 | 2324325 | 23 | 3 | [1,2,3,4,5,6,6,6,4] |
239230293892839892 | 23425345 | 23 | 3 | [1,2,3,4,5,6,6,6,99,4] |
I have tried using this query, but getting the wrong output format.
SELECT
customer.customer_no,
acct,
xx_var,
yy_var,
bi_acctno,
values_bi
FROM
struct_test
LATERAL VIEW explode(customer.acct) acct AS acctno
LATERAL VIEW explode(brancflow.acctno) bia as bi_acctno
LATERAL VIEW explode(brancflow.value) biv as values_bi
WHERE bi_acctno = acctno
Does anyone know how to approach this problem?
Use json_tuple to extract JSON elements. In case of array, it returns it also as string: remove square brackets, split and explode. See comments in the demo code.
Demo:
with mytable as (--demo data, use your table instead of this CTE
select '{"customer_no":"239230293892839892","acct":["2324325","23425345"]}' as customer,
23 xx_var, 3 yy_var,
'[{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]' branchflow
)
select c.customer_no,
a.acct,
t.xx_var, t.yy_var,
get_json_object(b.acct_branchflow,'$.value') value
from mytable t
--extract customer_no and acct array
lateral view json_tuple(t.customer, 'customer_no', 'acct') c as customer_no, accts
--remove [] and " and explode array of acct
lateral view explode(split(regexp_replace(c.accts,'^\\[|"|\\]$',''),',')) a as acct
--remove [] and explode array of json
lateral view explode(split(regexp_replace(t.branchflow,'^\\[|\\]$',''),'(?<=\\}),(?=\\{)')) b as acct_branchflow
--this will remove duplicates after lateral view: need only matching acct
where get_json_object(b.acct_branchflow,'$.acctno') = a.acct
Result:
customer_no acct xx_var yy_var value
239230293892839892 2324325 23 3 [1,2,3,4,5,6,6,6,4]
239230293892839892 23425345 23 3 [1,2,3,4,5,6,6,6,99,4]