This might be hard to explain, I will do my best. I am currently working on a csv transform stream parser in nodejs, but I am struggling in replacing all \n's and \r's inside quotes (") that wrap a value.
At the moment I have the following regex:
(^|[;])"(?:""|[^"])*[\n\r]+(?:""|[^"])*"
Where ; is the column delimiter.
And here is two examples, the first one where its doing what is expected and the second one where its capturing but it shouldn't because the ; is inside quotes.
First Test (success)
test;"123";"this description with new line feed below should be
matched by regex";test;"1.0"
Second Test (error)
NewLine1;"test - this one should not be captured by the regex but its being captured ";test;1
NewLine2;"test that went wrong"
Is there a way to pick the text that is between quotes, containing semicolon before first quote and containing semicolon after last quote, but ignore semicolon inside quotes? I think that's what I need , so the second example is not take into account for the regex match.
Thank you in advance.
You may use:
(^|;)"(?:""|[^";])*[\n\r]+(?:""|[^";])*"
I changed [;]
to ;
because they're equivalent in your case. Also added ;
character to [^";]
because your CSV stream column value, can't have this character.
I don't know why you have ""
in the regex but if you seek considering other double quotes in the column value, i assume they must be escaped by \
and so you can use regex like (^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*"
that has (?<=\\)"
instead of ""
which indicates "
character preceding with back slashes. (\"
)