Search code examples
pythonregexperlunicodeemoticons

How to find a textual description of emoticons, unicode characters and emoji in a string (python, perl)?


The detection and counting of emoticon icons has been addressed previously.

As a follow-up on this question and the solution provided, I'd like extend it with ability to link the detected emoticons, unicode characters and emoji to their corresponding (textual) descriptions:

  • emoticons (Western and Eastern, e.g. List_of_emoticons from Wikipedia),
  • unicode characters (e.g. U1F600.pdf available from the unicode website (direct link is included in the previous stackoverflow question mentioned above),
  • other emoji types, e.g. from the list of emoji frequently used in Twitter (twitter-emoji-list from the emojipedia website).

Is there any comprehensive solution already available for conducting such a translation, in python or perl, similar to the method implemented in Swift? If not, can you make a script that provides a textual description for an emoticon/emoji found in a string?


Solution

  • perl example using charnames:

    use 5.014;
    use strict;
    use warnings;
    use utf8;
    use open qw(:std :utf8);
    use charnames ':full';
    
    my @faces = split //, '😄😀😈';
    for (@faces) {
        say sprintf "U+%05X %s %s",
            ord($_), $_, charnames::viacode(ord($_));
    }
    

    prints

    U+1F604 😄 SMILING FACE WITH OPEN MOUTH AND SMILING EYES
    U+1F600 😀 GRINNING FACE
    U+1F608 😈 SMILING FACE WITH HORNS