I'm preprocessing tweets and need to set the maximum limit of the number of consecutive occurrences of "@USER" to 3 times. For example, a tweet like this:
this tweet contains hate speech @USER@USER@USER@USER@USER about a target group @USER@USER
after processing, should look like this:
this tweet contains hate speech @USER@USER@USER about a target group @USER@USER
I was able to achieve the desired result with a while
loop, however, I'm wondering if someone knows how to do it a simpler way. Thanks!
tweets = ["this tweet contains hate speech @USER@USER@USER@USER@USER about a target group @USER@USER"]
K = "@USER"
limit = 3
i = 0
for tweet in tweets:
tweet = tweet.split(' ')
while i < len(tweet):
if tweet[i].count(K) > limit:
tweet[i] = K*int(limit)
tweet = " ".join(str(item) for item in tweet)
i +=1
print(tweet)
# Output: this tweet contains hate speech @USER@USER@USER about a target group @USER@USER
You can just use re
to replace 4 or more occurrences of @USER
with three:
tweet = "this tweet contains hate speech @USER@USER@USER@USER@USER about a target group @USER@USER"
re.sub(r'(@USER){4,}', r'@USER@USER@USER', tweet)