I am streaming tweets using Tweepy filtered by these tags ["corona", "quarantine", "covid19"]
If I have this tweet for instance, "I fell down the stairs and ate an apple so no doctor #quarantine" I would like to get strings like "stairs", "apple", and "doctor" as a set of keywords
Is there any way to do this?
I am a beginner at python and I am using video tutorials on Youtube to start this project
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
lis = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, lis)
stream.filter(track=['covid19','corona','quarantine'])
You can use a list comprehension:
tags = ["corona", "quarantine", "covid19"]
tweet = "I fell down the stairs and ate an apple so no doctor #quarantine"
# print each word in the tweet that is longer than two characters and
# does not contain any of the tag words
print([word for word in tweet.split() if len(word) > 2 and not any(tag in word for tag in tags)])
This isn't a perfect solution, mainly because it excludes words that contain a tag, i.e. if one of the tags were wash
, then the word washington
would be excluded. But it's a start.