I am trying to find the best way to remove twitter usernames from a users tweet if there is a username present in the tweet. For example I have an array of stored tweets and I would like to return the tweet with the username taken out like this
tweets = ['@joe123 thank you', 'this reminds me of @john12', 'this tweet has no username tag in it']
clean_tweets = ['thank you', 'this reminds me of', 'this tweet has no username tag in it']
Here is what I have so far:
tweets = ['@joe123 thank you', 'this reminds me of @john12', 'this tweet has no username tag in it']
clean_tweets = [word for tweet in tweets for word in tweet.split() if not word.startswith('@')]
However output looks like this:
['thank',
'you',
'this',
'reminds',
'me',
'of',
'this',
'tweet',
'has',
'no',
'username',
'tag',
'in',
'it']
I am hoping for a better way to solve this other than using nested list comprehension. Maybe an apply function with lambda will work better? Anything helps thanks
There are many ways. Say, use regular expressions: replace a @ followed by at least one alphanumeric symbol, with an empty string.
import re
[re.sub(r'@\w+', '', x) for x in tweets]
#['thank you', 'this reminds me of', 'this tweet has no username tag in it']