Search code examples
javajava-21

Character::IsEmoji not working for Characters with numbers in them?


I have a Java 21 app where I want to determine if a string has an emoji. I am using the newly created Emoji API from Java 21 but every time I have an input String containing a number like "123" Character::isEmoji() returns true. I have been using this as a resource: https://inside.java/2023/11/20/sip089/

This is the code I have been using:

  private boolean containsEmoji(String s) {
    return s.codePoints().anyMatch(Character::isEmoji);
  }

For example:

System.out.println(
        "123".codePoints().anyMatch( Character :: isEmoji )
);

true

And also:

  private boolean containsEmoji(String s) {
    for(int i = 0; i < s.length(); i++) {
      int codePoint = s.codePointAt(i);
      if (Character.isEmoji(codePoint)) {
        return true;
      }
    } 
    return false;
  }

Solution

  • Those digits are emoji, technically

    Yes, digits 0-9 in the Basic Latin (US-ASCII) block of Unicode are considered to be Emoji, for reasons that escape me.

    Follow the trail of documentation:

    1. Javadoc for Character.isEmoji
    2. Unicode Emoji (Technical Standard #51)
    3. emoji-data
    4. emoji-data.txt (for Emoji Version 15.1)

    … lists:

    0030..0039 ; Emoji # E0.0 [10] (0️..9️) digit zero..digit nine

    Section 1.5.2 Versioning of the Unicode page explains comment E0.0 as:

    This label is used for special characters, including:

    • Most emoji component characters, regardless of when they were first encoded.

    • Other non-emoji characters in the data files.

    … which confounds me.

    But it seems to me that Character.isEmoji reporting plain digits as being emoji is a feature, not a bug.

    Use Charater.isEmojiPresentation

    To determine if a character is what we more commonly think of as an emoji, use another method on Character class: Charater.isEmojiPresentation. That method returns false for the code points of the Basic Latin digits.