How can I match characters (with the intention of removing them) from outside the unicode Basic Multilingual Plane in java?
To remove all non-BMP characters, the following should work:
String sanitizedString = inputString.replaceAll("[^\u0000-\uFFFF]", "");