Search code examples
pythonsumemoji

Summing Up Emoji Length


I have emoji lists and want to make a new variable by summing up all emojis' length in the list in each row, after encoding list in each row to 'utf-16be' and divide it to 2.

You can reproduce my code using below.

import pandas as pd
import emoji
import re

e_1 = emoji.emojize(":thinking_face:")
e_2 = emoji.emojize(":see-no-evil_monkey:")
e_3 = emoji.emojize(":relieved_face:")
e_4 = emoji.emojize(":two_hearts:")
e_5 = emoji.emojize(":two_women_holding_hands:")
e_6 = emoji.emojize(":bikini:")
e_7 = emoji.emojize(":woman_student_medium-dark_skin_tone:")

df = pd.DataFrame(
    [
        [f"{e_1}{e_2} me asΓ­ se {e_3} ds {e_4}{e_5}{e_6} hello {e_7}"],
        [f"{e_1}{e_2} me asΓ­ se {e_3} ds {e_4}{e_5}{e_6} hello"],
        [f"{e_1}{e_2} me asΓ­ se {e_3} ds"],
        [f"{e_1}{e_2} me asΓ­"],
    ],
    columns=["Text"],
)

df['emoji_list'] = df["Text"].apply(lambda row: ''.join(c for c in row if c in emoji.UNICODE_EMOJI))

df["emoji_len"] = sum(df["emoji_list"].apply(lambda x: x.encode('utf-16be')) // 2)

In df["emoji_list"], I have these in each row below

0 πŸ€”πŸ™ˆπŸ˜ŒπŸ’•πŸ‘­πŸ‘™πŸ‘©πŸΎπŸŽ“

1 πŸ€”πŸ™ˆπŸ˜ŒπŸ’•πŸ‘­πŸ‘™

2 πŸ€”πŸ™ˆπŸ˜Œ

3 πŸ€”πŸ™ˆ

My current code for df['emoji_len'] is not working. The error made is "unsupported operand type(s) for +: 'int' and 'bytes'". Could anyone please help me correct my code?


Solution

  • a few errors... use decode() and you need to compute len()//2

    byt = 'emoji_xxx'.encode('utf-16be')
    df = pd.DataFrame(dict(emoji_list = [byt for n in range(3)]))
    
    df["emoji_len"] = df["emoji_list"].apply(lambda x: len(x.decode('utf-16be')))//2
    print(df)
    
                                             emoji_list  emoji_len
    0  b'\x00e\x00m\x00o\x00j\x00i\x00_\x00x\x00x\x00x'         12
    1  b'\x00e\x00m\x00o\x00j\x00i\x00_\x00x\x00x\x00x'         12
    2  b'\x00e\x00m\x00o\x00j\x00i\x00_\x00x\x00x\x00x'         12