I am attempting to read in tweets and write these tweets to a file. However, I am getting UnicodeEncodeErrors when I try to write some of these tweets to a file. Is there a way to remove these non utf-8 characters so I can write out the rest of the tweet?
For example, a problem tweet may look it this:
Camera? 🎥
This is the code I am using:
with open("Tweets.txt",'w') as f:
for user_tws in twitter.get_user_timeline(screen_name='camera',
count = 200):
try:
f.write(user_tws["text"] + '\n')
except UnicodeEncodeError:
print("skipped: " + user_tws["text"])
mod_tw = user_tws["text"]
mod_tw=mod_tw.encode('utf-8','replace').decode('utf-8')
print(mod_tw)
f.write(mod_tw)
The error is this:
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f3a5' in position 56: character maps to
You are not writing a UTF8 encoded file, add the encoding parameter to the open function
with open("Tweets.txt",'w', encoding='utf8') as f:
...
Have fun 🎥