Search code examples
jsontalend

Extract Key Value Pair from JSON having commas in the value


I am trying to parse JSON and I want Key-Value Pair as output. I have used tNormalize to split on comma and then tExtractRegex to convert into Key Value Pair. This was working fine until one day I got a comma in the JSON value. However this comma is between the quotes and hence the JSON is valid but since the tNormalize component is splitting on commas, it is also taking these commas for splitting.

Eg. My Sample JSON:

{
"PQR":"XYZ",
"data":[
{
"Name":"John","Address":"someCity"
},
{
"Name":"Mary","Address":"someCity,somecountry","Age":39
}
]
}

The expected Output for this would be a key value pair:

Key|Value
Name|John
Address|someCity
Name|Mary
Address|someCity,someCountry
Age|39

Please note that I have taken this approach since the schema of the JSON can change over time and we may get more attributes in it. Looking for a dynamic solution.


Solution

  • This task would be better handled by a json parser than by regular expressions.
    Here's a solution using tExtractJSONFields component. set the loop Json query to $.data[*] in order to loop on the data array, and for each object get the Name, Address and Age (Age is null for John). Since you want key/value pairs on output, the tSplitRow creates 3 rows out of each input row, with Name, Address and Age each on a separate row. Then tFilterRow will filter out the row where Age is null.

    enter image description here

    And the components schemas :

    enter image description here

    enter image description here

    Here's the output:

    .-------+--------------------.
    |         tLogRow_1          |
    |=------+-------------------=|
    |key    |value               |
    |=------+-------------------=|
    |Name   |John                |
    |Address|someCity            |
    |Name   |Mary                |
    |Address|someCity,somecountry|
    |Age    |39                  |
    '-------+--------------------'
    

    Make sure you have a valid json, in your example you are missing a comma between the objects inside data.

    Edit:
    Based on your requirement (dynamic JSON), a simple solution would be to replace the comma inside your values with another unique character or set of characters not present in your data (say ##) so they are not picked up by the normalizer, then parse your JSON as before and lastly replace back the comma.

    You can achieve this by using tReplace before your tNormalize to replace commas inside values (not between ","). Select 'Advanced Mode', and use this pattern:

    (?<=\w+\s*),(?=\s*\w+)
    

    And replace with ##

    My regex works also if you have spaces before or after the comma.

    After you do your parsing, use another tReplace to replace ## with ","