Search code examples
csvazure-data-explorerkql

Splits by comma, but keep strings in double-quotes atomic


I'm trying to figure out a regex pattern that splits the following string:

2022-09-22T03:55:59.433Z,,,,sm100,"sm100.w.gm.net=25 2.7.2 mailto:[email protected] [IId=200023, Hostname=mky.wgm.net] Queued info",,SMTP,HAREDIRECT

into the following values:

2022-09-22T03:55:59.433Z,
,
,
,
sm100,
"sm100.w.gm.net=25 2.7.2 <mailto:[email protected]> \[IId=200023, Hostname=mky.wgm.net\] Queued info",
,
SMTP,
HAREDIRECT

I do not want the regex expression to split the values in row #6 (the longest row) by comma, because even if there is a comma after IId=200023, the entire string should be considered atomic because it is enclosed with double quotes.

I have tried a lot of patterns inside regex101, and this pattern is as far as I've gotten:

,(?![" ])

enter image description here

It seems to have identified the commas correctly, but I can't find a way to change my regex pattern to find these groups.


Solution

  • parse_csv()

    print txt = '2022-09-22T03:55:59.433Z,,,,sm100,"sm100.w.gm.net=25 2.7.2 mailto:[email protected] [IId=200023, Hostname=mky.wgm.net] Queued info",,SMTP,HAREDIRECT'
    | project parse_csv(txt)
    
    txt
    ["2022-09-22T03:55:59.4330000Z","","","","sm100","sm100.w.gm.net=25 2.7.2 mailto:[email protected] [IId=200023, Hostname=mky.wgm.net] Queued info","","SMTP","HAREDIRECT"]

    Fiddle