I am trying to extract emoji counts from text and currently struggle in giving the output columns titles.
If I try to assign columns, it would only recognize one column (the count), not the emoji column itself, which is the first one, so I assume the issue lies there? I thought with setting the index to 0 it would solve that, but apparently I misunderstood that.
Would I need to convert the df into a series to recognize the first and the second columns as such and be able to name them? (Lets same into: columns=['emoji','count'
]
MWE:
import csv
import re
import pandas as pd
import emoji
import regex
from collections import Counter
def split_count(text):
emoji_list = []
data = regex.findall(r'\X', text)
for word in data:
if any(char in emoji.UNICODE_EMOJI['en'] for char in word):
emoji_list.append(word)
return emoji_list
df = pd.DataFrame([{'id':1432691762007400458,'created_at':"2021-08-31T13:08:28.000Z",'text':"9月1日㈬は…🎺🎺🎶\n\n♦️18:50〜 山岸一生\n『練馬から変える!国会を創る!キックオフ集会』\nhttpstest\n\n♦️20:30~ 辻元清美\n#りっけんチャンネル\n「コロナ禍・五輪から見えた""おっさん政治""の実態」について\nhttpsxzt\n\n【テレビ】\n♦️19:30~ 玄葉光一郎\nBS-TBS「報道1930」",'author_id':951781409470889984},
{'id':1432687902148816898,'created_at':"2021-08-31T12:53:08.000Z",'text':"やはり別の地平であった🎶か...\n\nコロナ禍五輪\n貴重な物資をなぜ捨てる?",'author_id':1227501971742937088}])
text = df['text']
emoji_list= []
for t in text:
emoji_list=emoji_list+split_count(t)
df_sa = pd.DataFrame((Counter(emoji_list)), index=[0])
df_t = df_sa.T
print(df_t)
Current Result:
0
🎺 2
🎶 2
♦️ 3
Desired Result
emoji count
🎺 2
🎶 2
♦️ 3
Let us fix the way you are creating the dataframe:
pd.DataFrame(Counter(emoji_list).items(), columns=['emoji', 'count'])
emoji count
0 🎺 2
1 🎶 2
2 ♦️ 3