Search code examples
pythonstringsplitdata-cleaning

Best way to remove twitter user names from list of tweets?


I am trying to find the best way to remove twitter usernames from a users tweet if there is a username present in the tweet. For example I have an array of stored tweets and I would like to return the tweet with the username taken out like this

tweets = ['@joe123 thank you', 'this reminds me of @john12', 'this tweet has no username tag in it']

clean_tweets = ['thank you', 'this reminds me of', 'this tweet has no username tag in it']

Here is what I have so far:

tweets = ['@joe123 thank you', 'this reminds me of @john12', 'this tweet has no username tag in it']

clean_tweets = [word for tweet in tweets for word in tweet.split() if not word.startswith('@')]

However output looks like this:

['thank',
 'you',
 'this',
 'reminds',
 'me',
 'of',
 'this',
 'tweet',
 'has',
 'no',
 'username',
 'tag',
 'in',
 'it']

I am hoping for a better way to solve this other than using nested list comprehension. Maybe an apply function with lambda will work better? Anything helps thanks


Solution

  • There are many ways. Say, use regular expressions: replace a @ followed by at least one alphanumeric symbol, with an empty string.

    import re
    [re.sub(r'@\w+', '', x) for x in tweets]
    #['thank you', 'this reminds me of', 'this tweet has no username tag in it']