Search code examples
regexcsvjava-11carriage-returnlinefeed

Java regexp to remove CRLF between quotes


I'm having a string containing CSV lines. Some of its values contains the CRLF characters, marked [CRLF] in the example below

NOTE: Line 1: and Line 2: aren't part of the CSV, but for the discussion

Line 1: 
foo1,bar1,"john[CRLF]
dose[CRLF]
blah[CRLF]
blah",harry,potter[CRLF]
Line 2:
foo2,bar2,john,dose,blah,blah,harry,potter[CRLF]

Each time a value in a line have a CRLF, the whole value appears between quotes, as shown by line 1. Looking for a way to get ride of those CRLF when they appears between quotes.

Tried regexp such as:

data.replaceAll("(,\".*)([\r\n]+|[\n\r]+)(.*\",)", "$1 $3");

Or just ([\r\n]+) , \n+, etc. without success: the line continue to appears as if no replacement were made.

EDIT:

Solution

Found the solution here:

String data = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(data);
while (m.find()) {
    m.appendReplacement(result, m.group().replaceAll("\\R+", ""));
}
m.appendTail(result);
System.out.println(result.toString());

Solution

  • Using Java 9+ you can use a function code inside Matcher#replaceAll and solve your problem using this code:

    // pattern that captures quoted strings ignoring all escaped quotes
    Pattern p = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");
    
    String data1 = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";
    
    // functional code to get all quotes strings and then remove all line 
    // breaks from matched substrings
    String repl = p.matcher(data1).replaceAll(
       m -> m.group().replaceAll("\\R+", "")
    );
    
    System.out.println(repl);
    

    Output:

    "Test Line wo line break", "Test Line with line break"
    "Test Line2 wo line break", "Test Line2 with line break"
    

    Code Demo