Search code examples
javascriptregexlinefeedcapturing-group

Fix regex expression used to replace all \n and \r inside quotes


This might be hard to explain, I will do my best. I am currently working on a csv transform stream parser in nodejs, but I am struggling in replacing all \n's and \r's inside quotes (") that wrap a value.

At the moment I have the following regex:

(^|[;])"(?:""|[^"])*[\n\r]+(?:""|[^"])*"

Where ; is the column delimiter.

And here is two examples, the first one where its doing what is expected and the second one where its capturing but it shouldn't because the ; is inside quotes.

First Test (success)

test;"123";"this description with new line feed  below should be
matched by regex";test;"1.0"
 

Second Test (error)

NewLine1;"test - this one should not be captured by the regex but its being captured ";test;1
NewLine2;"test that went wrong"

Is there a way to pick the text that is between quotes, containing semicolon before first quote and containing semicolon after last quote, but ignore semicolon inside quotes? I think that's what I need , so the second example is not take into account for the regex match.

Thank you in advance.


Solution

  • You may use:

    (^|;)"(?:""|[^";])*[\n\r]+(?:""|[^";])*"

    Regex Demo

    I changed [;] to ; because they're equivalent in your case. Also added ; character to [^";] because your CSV stream column value, can't have this character.

    I don't know why you have "" in the regex but if you seek considering other double quotes in the column value, i assume they must be escaped by \ and so you can use regex like (^|;)"(?:(?<=\\)"|[^";])*[\n\r]+(?:(?<=\\)"|[^";])*" that has (?<=\\)" instead of "" which indicates " character preceding with back slashes. (\")