Search code examples
javaregexunicodeastral-plane

Java regex match characters outside Basic Multilingual Plane


How can I match characters (with the intention of removing them) from outside the unicode Basic Multilingual Plane in java?


Solution

  • To remove all non-BMP characters, the following should work:

    String sanitizedString = inputString.replaceAll("[^\u0000-\uFFFF]", "");