I am trying to parse JSON and I want Key-Value Pair as output. I have used tNormalize to split on comma and then tExtractRegex to convert into Key Value Pair. This was working fine until one day I got a comma in the JSON value. However this comma is between the quotes and hence the JSON is valid but since the tNormalize component is splitting on commas, it is also taking these commas for splitting.
Eg. My Sample JSON:
{
"PQR":"XYZ",
"data":[
{
"Name":"John","Address":"someCity"
},
{
"Name":"Mary","Address":"someCity,somecountry","Age":39
}
]
}
The expected Output for this would be a key value pair:
Key|Value
Name|John
Address|someCity
Name|Mary
Address|someCity,someCountry
Age|39
Please note that I have taken this approach since the schema of the JSON can change over time and we may get more attributes in it. Looking for a dynamic solution.
This task would be better handled by a json parser than by regular expressions.
Here's a solution using tExtractJSONFields
component. set the loop Json query to $.data[*]
in order to loop on the data array, and for each object get the Name
, Address
and Age
(Age is null for John). Since you want key/value pairs on output, the tSplitRow
creates 3 rows out of each input row, with Name, Address and Age each on a separate row. Then tFilterRow
will filter out the row where Age is null.
And the components schemas :
Here's the output:
.-------+--------------------.
| tLogRow_1 |
|=------+-------------------=|
|key |value |
|=------+-------------------=|
|Name |John |
|Address|someCity |
|Name |Mary |
|Address|someCity,somecountry|
|Age |39 |
'-------+--------------------'
Make sure you have a valid json, in your example you are missing a comma between the objects inside data.
Edit:
Based on your requirement (dynamic JSON), a simple solution would be to replace the comma inside your values with another unique character or set of characters not present in your data (say ##) so they are not picked up by the normalizer, then parse your JSON as before and lastly replace back the comma.
You can achieve this by using tReplace
before your tNormalize
to replace commas inside values (not between ","). Select 'Advanced Mode', and use this pattern:
(?<=\w+\s*),(?=\s*\w+)
And replace with ##
My regex works also if you have spaces before or after the comma.
After you do your parsing, use another tReplace
to replace ##
with ","