Search code examples
javaregexstringreplaceall

Java regex matches but String.replaceAll() doesn't replace matching substrings


public class test {
        public static void main(String[]args) {
            String test1 = "Nørrebro, Denmark";
            String test2 = "ø";
            String regex = new String("^&\\S*;$");
            String value = test1.replaceAll(regex,"");
            System.out.println(test2.matches(regex));
            System.out.println(value);
        }
    }

This gives me following Output:

true
Nørrebro, Denmark

How is that possible ? Why does replaceAll() not register a match?


Solution

  • It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.

    Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.

    You may use &\w+; or &\S+?; pattern, e.g.:

    String test1 = "Nørrebro, Denmark";
    String regex = "&\\w+;";
    String value = test1.replaceAll(regex,"");
    System.out.println(value); // => Nrrebro, Denmark
    

    See the Java demo.

    The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.