Search code examples
pythontwittertwitter-oauthtweepy

Filtering Twitter data using Tweepy


I've used Marco Bonzanini's tutorial on mining Twitter data : https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

class MyListener(StreamListener):

    def on_data(self, data):
        try:
            with open('python.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True

    def on_error(self, status):
        print(status)
        return True

and used the "follow" parameter of the filter method to retrieve the tweets produced by this specific ID :

twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(follow=["63728193"#random Twitter ID])

However, it does not seem to fulfill the mission since it not only returns the tweets & retweets created by the ID, but also every tweet wherein the ID is mentioned (i.e. retweets). That is not what I want.

I'm sure there must be a way to do it since there is a "screen_name" field in the json file given by Twitter. That screen_name field gives the name of the creator of the Tweet. I just have to find how to filter the data on this screen_neame field.


Solution

  • This behaviour is by design. To quote the Twitter streaming API docs:

    For each user specified, the stream will contain:

    • Tweets created by the user.
    • Tweets which are retweeted by the user.
    • Replies to any Tweet created by the user.
    • Retweets of any Tweet created by the user.
    • Manual replies, created without pressing a reply button (e.g. “@twitterapi I agree”).

    The best way for you to process it for your purposes is to check who created the tweet as it is received, which I believe can be done as follows:

    class MyListener(StreamListener):
        def on_data(self, data):
            try:
                if data._json['user']['id'] == "63728193":
                    with open('python.json', 'a') as f:
                        f.write(data)
            except BaseException as e:
                print("Error on_data: %s" % str(e))
            return True
    
        def on_error(self, status):
            print(status)
            return True