Search code examples
jsonhivehiveqlregexp-replacejson-extract

regex_replace function skips anything coming after a NULL value


In my hive table "ticket_full" I have a json type column named "service_id" that I would like to extract in 3 columns, which is like this

[{"position":"1","typeid":"ROUTNAME","value":"PWAW13197"},{"position":"2","typeid":"CDCNAME","value":null},{"position":"3","typeid":"SVCNAME","value":"Business"},{"position":"4","typeid":"USID","value":"FI021MLQE4"}]

[{"position":"1","typeid":"ROUTNAME","value":"KHLA30076"},{"position":"2","typeid":"CDCNAME","value":"eff-e-rjh-sw-cs2"},{"position":"3","typeid":"SVCNAME","value":"Managed LAN"},{"position":"4","typeid":"USID","value":"SA00BNGH0E"}]

[{"position":"1","typeid":"NUMLIAPTT","value":"0492212984"},{"position":"2","typeid":null,"value":null},{"position":"3","typeid":null,"value":null},{"position":"4","typeid":null,"value":null}]

I used the code below:

SELECT get_json_object(single_json_table.identifiant_produit, '$.position') AS position,
  get_json_object(single_json_table.identifiant_produit, '$.typeid') AS typeid,
  get_json_object(single_json_table.identifiant_produit, '$.value') AS value
  FROM   
(SELECT explode(split(regexp_replace(substr(serviceid, 2, length(serviceid)-2),
            '"},\\{"', '"},,,,{"'), ',,,,')  ) as identifiant_produit 
  FROM ticket_full) single_json_table

it works but every time there is a value at NULL, it ignores what follows and goes to the next field: example:

enter image description here

Does anyone know how to fix this please ?


Solution

  • It is because null has no double-quotes and you are replacing this '"},\\{"' with this '"},,,,{"'

    Try to remove double-quote before } in the regex pattern and replacement string accordingly, then it will work with quoted values and nulls also:

    split(regexp_replace(substr(serviceid, 2, length(serviceid)-2),
                '},\\{"', '},,,,{"'), ',,,,')