I am trying to sanitize a string in java (that comes from a comment box) and remove special characters and anything strange like an emoji, the challenge is that the comment can be written in several languages like Chinese, Japanese, Spanish, English ect . Does anyone know any library or method to achieve this? Thanks in advance.ç
here an example of the value url: commentText=Thanks+for+your+review%2C+Francesco+%F0%9F%AB%B6
thist is the part that I would like to remove: %F0%9F%AB%B6
I'll answer my own question in case someone finds it useful I solved this using a regular expression:
String regex = "[^\\p{L}\\p{N}\\p{P}\\p{Z}]";
String commet = "text to sanitize";
comment.replaceAll(regex, "");
regexp explanation: