Search code examples
sqljsonhive

Hive - Parsing a Table That Has a JSON Column With Multiple Rows Per Field


I have a table similar to this, where i need to pull the value out of each json field that has an id equal to 2. I've tried multiple examples, but cannot manage to wrap my head around how to manipulate json_tuples / lateral views / explodes into my scenario. Any help is much appreciated!

value1,value2,value3,"[{""id"":1,""value"":""x""},{"id"":2,""value"":""y""}, {"id"":3,""value"":""blah""}]"
value4,value5,value6,"[{""id"":1,""value"":""a""},{"id"":2,""value"":""b""}, {"id"":3,""value"":""blahblah""}]"
col1 col2 col3 id2value
value1 value2 value3 y
value4 value5 value6 b

Examples Ive tried to manipulate: How to extract selected values from json string in Hive -dont know how to check for a value within the sub-row

Hive Sql Query To get Json Object from Json Array -dont know how to combine this with selecting the normal columns


Solution

  • Your example is a little unclear, so I'm making some guesses and assumptions (the value* strings are in separate columns, your json string is stored in an array column,... And your quotes in your json array are kind of a mess, that's not valid json.)

    You can use lateral view on your json array (technically an array of structs), and then you can filter on that.

    select
    col1,
    col2,
    col3,
    exp.id,
    exp.value
    from
    <your table>
    lateral view inline(<array column>) exp as id,value
    where
    exp.id = 1
    

    EDIT: If your json is just stored as a string array(of valid json), you have to go through some extra hoops. You convert it to an array with your regex and explode. Then you can pull the individual elements out using json_tuple. If there's a better/less ugly way to do this, I don't know it.

    select
    col1,
    col2,
    col3,
    t2.*,
    jsonstring,
    t.col
    
    
    from
    <your table>
    --first strip the [] off so hive can see this as an array
    LATERAL VIEW explode(
      split(
        regexp_replace(yourtable.jsonstring,'^\\[|\\]$',''), '(?<=\\}),(?=\\{)'   
         )
    ) t as col
    --and then use json_tuple to get the elements from the array
    lateral view json_tuple(t.col,'id','value') t2 as id,value
    where
    t2.id = 2