Search code examples
pythontwittertweepy

Is there way to obtain words from a tweet that aren't used to filter tweets?


I am streaming tweets using Tweepy filtered by these tags ["corona", "quarantine", "covid19"]

If I have this tweet for instance, "I fell down the stairs and ate an apple so no doctor #quarantine" I would like to get strings like "stairs", "apple", and "doctor" as a set of keywords

Is there any way to do this?

I am a beginner at python and I am using video tutorials on Youtube to start this project

class StdOutListener(StreamListener):

    def on_data(self, data):
        print data
        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':
    
    lis = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    stream = Stream(auth, lis)

    stream.filter(track=['covid19','corona','quarantine'])

Solution

  • You can use a list comprehension:

    tags =  ["corona", "quarantine", "covid19"]
    tweet = "I fell down the stairs and ate an apple so no doctor #quarantine"
    
    # print each word in the tweet that is longer than two characters and
    # does not contain any of the tag words
    print([word for word in tweet.split() if len(word) > 2 and not any(tag in word for tag in tags)])
    

    This isn't a perfect solution, mainly because it excludes words that contain a tag, i.e. if one of the tags were wash, then the word washington would be excluded. But it's a start.