I have a requirement where I need to remove two specific control characters: ^@
and ^M
, from the incoming data in Java on a Linux box.
Below mentioned parts work as expected:
String s;
s = s.replaceAll("\\x00","as");
s = s.replaceAll("\\000", "as");
but these don't:
s = s.replaceAll("\\015", "as"); //Octal
s = s.replaceAll("\\x0D", "as"); //Hex
I have tried all available representations(octal/hex/unicode) including \r
to represent ^M
in my code but it does not work. As mentioned above everything works fine for other control characters.
Please suggest if there's anything that I haven't tried or missed.
Edit: Providing the implementable code as requested.
public class sampSC {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("./samp1.txt"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
line = br.readLine();
}
String s = sb.toString();
System.out.println(s);
s = s.replaceAll("\\00", "sb"); //works
System.out.println(s);
s = s.replaceAll("\\x11", "s23b"); //works
System.out.println(s);
s = s.replaceAll("\\r$", "aa"); //doesn't work
System.out.println(s);
} finally {
br.close();
}
}
}
To summarize the comments: the file is read in line by line with BufferedReader.readLine()
. The readline strips off the line break character ^M (\r) so it never makes it into the String
that is later searched.