I am parsing a json
in presto and applying regular expression on top of it.
{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"qbc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}
Regexp used :
REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(
Opfields,
'$.value'
),
'.*"abc":(.*?),',
1
) abc
Now this works for abc as there is a comma after abc value but does not work for ghi. For ghi I know in this example I could use } in this case. But the problem is that these scaler values like abc , ghi can occur in different order in which case it might end up with a comma or }.
I am looking for an expression which would work in every case no matter what the order is. Essentially I want the values for abc, def and ghi no matter in which order they appear in the json.
Can someone please advice a good way to do this.
I have mentioned the regexp I have tried
You don't need to use regex here, based on the shown data you seem to have just a double encoded JSON here (i.e. value
contains a JSON string which holds an encoded JSON) so you can parse it again and process it as JSON. With Presto (due to it's limited JSON path support) it will be a bit cumbersome involving some casts to maps and arrays:
-- sample data
with dataset(opfields) as(
values ('{"value":"{\"123\":{\"type\":\"TAGGING\",\"role\":\"RESULT\",\"payload\":[{\"values\":{\"abc\":\"not-abc\",\"def\":\"yes\",\"ghi\":\"gh\"}}]}}"}')
)
-- query
select JSON_EXTRACT(js1, '$.values.abc') abc,
JSON_EXTRACT(js1, '$.values.ghi') ghi
from dataset,
unnest (cast(JSON_PARSE(JSON_EXTRACT_SCALAR(opfields, '$.value')) as map(varchar, json))) as t(k, js),
unnest (cast(JSON_EXTRACT(js, '$.payload') as array(json))) as t1(js1);
Output:
Abc | ghi
-----------------+-----------
"not-abc" | "gh"
If you actually are using Trino you can use json_query
:
select json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.abc' WITHOUT ARRAY WRAPPER) sensitivity,
json_query(JSON_EXTRACT_SCALAR(opfields, '$.value'), 'strict $.*.payload[*].values.ghi' WITHOUT ARRAY WRAPPER) certainty
from dataset;