I want a method in the following format:
public boolean isValidHtmlEscapeCode(String string);
Usage would be:
isValidHtmlEscapeCode("A") == false
isValidHtmlEscapeCode("ש") == true // Valid unicode character
isValidHtmlEscapeCode("ש") == true // same as 1513 but in HEX
isValidHtmlEscapeCode("�") == false // Invalid unicode character
I wasn't able to find anything that does that - is there any utility that does that? If not, is there any smart way to do it?
public static boolean isValidHtmlEscapeCode(String string) {
if (string == null) {
return false;
}
Pattern p = Pattern
.compile("&(?:#x([0-9a-fA-F]+)|#([0-9]+)|([0-9A-Za-z]+));");
Matcher m = p.matcher(string);
if (m.find()) {
int codePoint = -1;
String entity = null;
try {
if ((entity = m.group(1)) != null) {
if (entity.length() > 6) {
return false;
}
codePoint = Integer.parseInt(entity, 16);
} else if ((entity = m.group(2)) != null) {
if (entity.length() > 7) {
return false;
}
codePoint = Integer.parseInt(entity, 10);
} else if ((entity = m.group(3)) != null) {
return namedEntities.contains(entity);
}
return 0x00 <= codePoint && codePoint < 0xd800
|| 0xdfff < codePoint && codePoint <= 0x10FFFF;
} catch (NumberFormatException e) {
return false;
}
} else {
return false;
}
}
Here's the set of named entities http://pastebin.com/XzzMYDjF