Search code examples
javastringreplaceallcontain

Strange behaviours of the String's contains and replaceAll methods with special characters


I did a little research with the String's contains and replaceAll methods.

char c = '*';

String str = "1220"+c+""+c+""+c+""+c+""+c+"23";
System.out.println(str.contains(c+""));
System.out.println(str.contains("["+c+"]"));
System.out.println(str.contains("\\"+c));


System.out.println(str.replaceAll("["+c+"]", "X"));
System.out.println(str.replaceAll("\\"+c, "X"));
System.out.println(str.replaceAll(c+"", "X"));

Results : When c = '*' or '^' or '+'

true
false
false
1220XXXXX23
1220XXXXX23
java.util.regex.PatternSyntaxException

When c = '#' or '~' or '%' or '<' or '>' or '=' or '&' or '@' or '-' or '!'

true
false
false
1220XXXXX23
1220XXXXX23
1220XXXXX23

When c = '$'

true
false
false
1220XXXXX23
1220XXXXX23
1220$$$$$23X

when c = '|'

true
false
false
1220XXXXX23
1220XXXXX23
X1X2X2X0X|X|X|X|X|X2X3X

I am wondering about what is the theory / rule behind this?


Solution

  • The argument of contains and the first argument of replaceAll are interpreted differently: the former is just a character sequence, while the later is a regular expression. Since * is a meta-character of the Java's regexp language that cannot appear unescaped on its own (it must follow an expression being repeated zero or more times in the match), it is treated differently by the two methods.