apache-sparkapache-zeppelin

how to extract data JSON from zeppelin sql


I query to test_tbl table on Zeppelin. the table data structure looks like as below :

%sql
desc stg.test_tbl
col_name | data_type | comment
id       |  string   |
title    |  string   |
tags     |  string   |

The tags column has data JSON type following as :

{"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}

and I want to see the JSON data with columns, so my query is :

select *, tag.*
from stg.test_tbl as t
lateral view explode(t.tags.name) name as name
lateral view explode(name.pos_code) pos_code as pos_code

but when I query, it returns

Can't extract value from tags#3423: need struct type but got string; line 3 pos 21
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

should i query as string in where statement?


Solution

  • Answered myself. I could use get_json_object in string type of JSON. Also, if the JSON format is array like below

    {"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}
    

    Then I could query using the key

    select * from stg.test_tbl as t
    where t.pos_code[0].house = "tlv"