I have string like this:
−+-~*/@$^#¨%={}[häagen-dazs;:] a (le & co') jsou "výborné" <značky>?!.
And I want to end up with this:
häagen-dazs a le & co jsou výborné značky.
In comparison to How to filter string for unwanted characters using regex? I want to keep accent (diacritics) in the string.
I use following replaceAll:
str.replaceAll("[¨%=;\\:\\(\\)\\$\\[\\]\\{\\}\\<\\>\\+\\*\\−\\@\\#\\~\\?\\!\\^\\'\\\"\\|\\/]", "");
You can loop through all the input String
characters and test each one if it matches your wanted Regex keep it, use this Regex [a-zA-Z& \\-_\\.ýčéèêàâùû]
to test upon each character individually.
This is the code you need:
String input = "−+-~*/@$^#¨%={}[häagen-dazs;:] a (le & co') jsou výborné <značky>?!";
StringBuffer sb = new StringBuffer();
for(char c : input.toCharArray()){
if((Character.toString(c).toLowerCase()).matches("[a-zA-Z& \\-_\\.ýčéèêàâùû]")){
sb.append(c);
}
}
System.out.println(sb.toString());
Demo:
And here's a working Demo that uses this code and gives the following output:
-hagen-dazs. a le & co jsou výborné značky
Note:
input.toCharArray()
to get an array of char
s and loop over it.(Character.toString(c).toLowerCase()).matches("[a-zA-Z& \\-_\\.ýčéèêàâùû]")
to test if the iterated char
matches the allowed characters Regex.StringBuffer
to construct a new String
with only the
allowed characters.