I have a Java 21 app where I want to determine if a string has an emoji. I am using the newly created Emoji API from Java 21 but every time I have an input String containing a number like "123" Character::isEmoji() returns true. I have been using this as a resource: https://inside.java/2023/11/20/sip089/
This is the code I have been using:
private boolean containsEmoji(String s) {
return s.codePoints().anyMatch(Character::isEmoji);
}
For example:
System.out.println(
"123".codePoints().anyMatch( Character :: isEmoji )
);
true
And also:
private boolean containsEmoji(String s) {
for(int i = 0; i < s.length(); i++) {
int codePoint = s.codePointAt(i);
if (Character.isEmoji(codePoint)) {
return true;
}
}
return false;
}
Yes, digits 0-9 in the Basic Latin (US-ASCII) block of Unicode are considered to be Emoji, for reasons that escape me.
Follow the trail of documentation:
Character.isEmoji
… lists:
0030..0039 ; Emoji # E0.0 [10] (0️..9️) digit zero..digit nine
Section 1.5.2 Versioning of the Unicode page explains comment E0.0
as:
This label is used for special characters, including:
• Most emoji component characters, regardless of when they were first encoded.
• Other non-emoji characters in the data files.
… which confounds me.
But it seems to me that Character.isEmoji
reporting plain digits as being emoji is a feature, not a bug.
Charater.isEmojiPresentation
To determine if a character is what we more commonly think of as an emoji, use another method on Character
class: Charater.isEmojiPresentation
. That method returns false
for the code points of the Basic Latin digits.