I am wondering how to extract all the emojis from text, then add them to a new column while removing them from the original text - if that makes sense.
For example, consider this data:
ID | Text |
---|---|
1 | This is good 💯 |
2 | Loving you so much 😍 ❤️ |
3 | You make me sad! 😥 |
This is my anticipated output:
ID | Text | Emoji |
---|---|---|
1 | This is good | 💯 |
2 | Loving you so much | 😍 ❤️ |
3 | You make me sad! | 😥 |
So far, I have tried this solution, but it has not worked for me, as it does not remove the emoji from the original text.
Any help on how to do this would be great.
Thanks!
Something along the following line should work for your purposes:
import pandas as pd
import emoji as emj
EMOJIS = emj.UNICODE_EMOJI["en"]
df = pd.DataFrame(
data={
"text": [
"This is good 💯",
"Loving you so much 😍 ❤️",
"You make me sad! 😥",
]
}
)
def extract_emoji(df):
df["emoji"] = ""
for index, row in df.iterrows():
for emoji in EMOJIS:
if emoji in row["text"]:
row["text"] = row["text"].replace(emoji, "")
row["emoji"] += emoji
extract_emoji(df)
print(df.to_string())
text emoji
0 This is good 💯
1 Loving you so much ️ ❤️😍
2 You make me sad! 😥
Note that extract_emoji
modifies the DataFrame
in place.