Search code examples
regextwitterregex-negationrapidminer

Exclude usernames from tweets, using regular expressions, in RapidMiner


Working on a sentiment analysis problem, I am trying to exclude the usernames from the text of tweets. For example, having the following tweet.

`Hey @SOCommunity check this out!`

I'm trying to keep just this

`Hey check this out!`

So far I've seen how to select the username @\S+\s+ and I've tried to negate it using this expression ^(?!@\S+\s+)\w+ which only captures the Hey leaving out the rest of it.

How should I edit the expression to also catch the rest of the tweet?


Solution

  • You can use sed to replace the user name from the text. Sed command sed 's/@[a-zA-Z0-9]* //'

    Ex:

     echo 'Hey @SOCommunity1 check this out!' | sed 's/@[a-zA-Z0-9_]\{1,15\} //'
    

    Output:

    Hey check this out!
    

    To apply sed command against a file named tweets.tx

    sed 's/@[a-zA-Z0-9_]\{1,15\} //' tweets.txt