Search code examples
python-3.xunicodeemoji

how to handle list that contains emoji in Python3


I've been making function that takes list that has only emoji and transfer it to utf-8 unicode and return the unocode list . My current code seems to take multiple args and return error . I'm new to handling emoji . Could you give me some tips ??

main.py

def encode_emoji(emoji_list):
    result = []
    for i in range(len(emoji_list)):
        emoji = str(emoji_list[i])
        d_ord = format(ord(":{}:","#08x").format(emoji))
        result.append(str(d_ord))
        break
    return result 


encode_emoji(["😀","😃","😄"])
Result of above code

Traceback (most recent call last):
  File "main.py", line 11, in <module>
    encode_emoji(["😀","😃","😄"])
  File "main.py", line 5, in encode_emoji
    d_ord = format(ord(":{}:","#08x").format(emoji))
TypeError: ord() takes exactly one argument (2 given)

Solution

  • I have no idea of how you intend to get the utf-8 encoding of an emoji with this line:

    d_ord = format(ord(":{}:","#08x").format(emoji))
    

    As the error message says, ord would take a single argument: a 1-character long string, and return an integer. Now, even if the code above would be placed so that the value returned by ord(emoji) was correctly concatenated to 0x8 as a prefix, that would basically be an specific representation of a basically random hexadecimal number - not the utf-8 sequence for the emoji.

    To encode some text into utf-8, just call the encode method of the string itself.

    Also, in Python, one almost never will use the for... in range(len(...)) pattern, as for is well designed to iterate over any sequence or iterable with no side effects.

    Your code also have a loosely placed break statement that would stop any processing after the first character.

    Without using the list-comprehension syntax, a function to encode emoji as utf-8 byte strings is just:

    def encode_emoji(emoji_list):
       result = []
       for part in emoji_list:
           result.append(part.encode("utf-8"))
    

    Once you get more acquainted with the language and understand comprehensions, it is just:

    
    def encode_emoji(emoji_list):
        return [part.encode("utf-8") for part in emoji_list)]
    

    Now, given the #8 pattern in your code, it may be that you have misunderstood what utf-8 means, and are simply trying to write down the emoji's as valid HTML encoded char references - that later will be embedded in text that will be encoded to utf-8.

    In that case, you have indeed to call ord(emoji) to get its codepoint, but then represent the resulting number as hexadecimal, and replace the leading 0x Python's hex call yields with #:

    
    def encode_emoji(emoji_list):
        return [hex(ord(emoji)).replace("0x", "#") + ";" for emoji in emoji_list)]