Search code examples
sqlregexhivehiveql

How to extract a JSON value in Hive


I Have a JSON string that is stored in a single cell in the DB corresponding to a parent ID

{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2019-03-24T15:19:52.639Z","profileType":"ADULT","id":"abc","signupDeviceId":"1"}||{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2021-09-05T07:47:00.245Z","imageId":"19","profileType":"KIDS","name":"Kids","id":"efg","signupDeviceId":"1"}

Now I want to use the above JSON to extract the id from this. Let say we have data like

Parent ID  |  Profile JSON
1          |  {profile_json} (see above string)

I want the output to look like this

Parent ID  |  ID
1          |  abc
1          |  efg

Now, I've tried a couple of iterations to solve this

First Approach:

select
    get_json_object(p.profile, '$$.id') as id,
    test.parent_id
    
    from (
        select split(
                regexp_replace(
                    regexp_extract(profiles, '^\\[(.+)\\]$$',1),
                '\\}\\,\\{', '\\}\\|\\|\\{'),
                '\\|\\|') as profile_list,
                parent_id ,
                
        from source_table) test
        lateral view explode(test.profile_list) p as profile
)

But this is returning the id column as having NULL values. Is there something I'm missing here.

Second Approach:

with profiles as(
  select        regexp_replace(
                    regexp_extract(profiles, '^\\[(.+)\\]$$',1),
                '\\}\\,\\{', '\\}\\|\\|\\{') as profile_list,
                parent_id
                
        from source_table
)      

SELECT
  get_json_object (t1.profile_list,'$.id')
FROM profiles t1

The second approach is only returning the first id (abc) as per the above JSON string.


Solution

  • Solve this. Was using a extract $ in the First Approach

    select
    get_json_object(p.profile, '$.id') as id,
    test.parent_id
    
    from (
        select split(
                regexp_replace(
                    regexp_extract(profiles, '^\\[(.+)\\]$$',1),
                '\\}\\,\\{', '\\}\\|\\|\\{'),
                '\\|\\|') as profile_list,
                parent_id ,
                
        from source_table) test
        lateral view explode(test.profile_list) p as profile
    

    )