public class test {
public static void main(String[]args) {
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("^&\\S*;$");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
}
}
This gives me following Output:
true
Nørrebro, Denmark
How is that possible ? Why does replaceAll() not register a match?
It is possible because ^&\S*;$
pattern matches the entire ø
string but it does not match entire Nørrebro, Denmark
string. The ^
matches (requires here) start of string to be right before &
and $
requires the ;
to appear right at the end of the string.
Just removing the ^
and $
anchors may not work, because \S*
is a greedy pattern, and it may overmatch, e.g. in Nørrebro;
.
You may use &\w+;
or &\S+?;
pattern, e.g.:
String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark
See the Java demo.
The &\w+;
pattern matches a &
, then any 1+ word chars, and then ;
, anywhere inside the string. \S*?
matches any 0+ chars other than whitespace.