Search code examples
javaregexregexp-replace

Regexp- replace specific line break in String


I am seeking for a regexp that finds a specific line break \n from a long String.

The specific \n is the one before a line that do not contains a specific char: '#'

As example:

This tis a fine #line1\nThis tis another fine #line2\nThis_belongs_to abobe line\nThis tis still is OK #line4

that represents the text:

this tis a fine #line1
this tis another fine #line2
this_belongs_to abobe line
this tis still is OK #line4

here the \n to be removed in the one after #line2, resulting in the text:

this tis a fine #line1
this tis another fine #line2this_belongs_to abobe line
this tis still is OK #line4

I came up with a regexp like: \n^(?m)(?!.*#).*$ that is close, but I can't figure out how to build the right one that allows me to match and remove only the right line break and preserve the remaining text/String.

Perhaps there is a better way than using regular expression?


Solution

  • You can use

    text = text.replaceAll("\\R(?!.*#)", "");
    text = text.replaceAll("(?m)\\R(?=[^\n#]+$)", "");
    

    See the regex demo / regex demo #2. Details:

    • (?m) - Pattern.MULTILINE embedded flag option to make $ in this pattern match end of a line, not the end of the whole string
    • \R - any line break sequence
    • (?!.*#) - a negative lookahead that matches a location not immediately followed with any zero or more chars other than line break chars as many as possible and then a # char
    • (?=[^\n#]+$) - a positive lookahead that requires one or more chars (replace + with * to match an empty line, too) other than an LF and # up to an end of a line.

    See the Java demo online:

    String s_lf = "this tis a fine #line1\nthis tis another fine #line2\nthis_belongs_to abobe line\nthis tis still is OK #line4";
    String s_crlf = "this tis a fine #line1\r\nthis tis another fine #line2\r\nthis_belongs_to abobe line\r\nthis tis still is OK #line4";
     
    System.out.println(s_lf.replaceAll("\\R(?!.*#)", "")); 
    System.out.println(s_crlf.replaceAll("\\R(?!.*#)", ""));
     
    System.out.println(s_lf.replaceAll("(?m)\\R(?=[^\n#]+$)", "")); 
    System.out.println(s_crlf.replaceAll("(?m)\\R(?=[^\n#]+$)", "")); 
    

    All test cases - with strings having CRLF and LF line endings - result in

    this tis a fine #line1
    this tis another fine #line2this_belongs_to abobe line
    this tis still is OK #line4