Search code examples
pythontwittertweepy

Python: How to get all the replies to Tweets from a Twitter account?


I'm getting all the Tweets that I need from a Twitter account. More than 200 Tweets; for example 500, 600, ...

I'm using the Tweepy library to help me to do this with Python, and I have created this object to do this.

from rrss.twitter_connection import TwitterConnection
import tweepy


class Tweets:
    def __init__(self):
        self.all_tweets = []    # List of tweets
        self.__total_tweets = None
        self.__screen_name = None
        self.__replies = None

    def __del__(self):
        del self.all_tweets
        del self.screen_name
        del self.total_tweets
        del self.replies

    @property
    def screen_name(self):  # Screen name of twitter account which we are going to retrieve all their tweets
        return self.__screen_name

    @screen_name.setter
    def screen_name(self, screen_name):
        self.__screen_name = screen_name

    @screen_name.deleter
    def screen_name(self):
        del self.__screen_name

    @property
    def total_tweets(self): # Total tweets which wants to be returned
        return self.__total_tweets

    @total_tweets.setter
    def total_tweets(self, total):
        self.__total_tweets = total

    @total_tweets.deleter
    def total_tweets(self):
        del self.__total_tweets

    @property
    def replies(self):
        return self.__replies

    @replies.setter
    def replies(self, replies):
        self.__replies = replies

    @replies.deleter
    def replies(self):
        del self.__replies

    @staticmethod
    def __get_tweets(total, screen_name, oldest_id=None):
        """
        :param total: Number of tweets to return
        :param screen_name: Twitter account
        :param oldest_id: The last id of the tweet retrieved
        :return: A list with at least a number of tweets equal to variable total from the Twitter Account relationed to screen_name variable
        """
        api = TwitterConnection().api
        if oldest_id is None:
            tweets = api.user_timeline(screen_name=screen_name, count=total, include_rts=False, tweet_mode="extended")
        else:
            tweets = api.user_timeline(screen_name=screen_name, count=total, include_rts=False, max_id=oldest_id - 1, tweet_mode="extended")
        return tweets

    def get_tweets(self, total, screen_name):
        """
            Public method to get a total number of tweets from a screen name
        :param total: Total of tweets to retrieve from a screen name
        :param screen_name: Twitter account
        :return: Update self.all_tweets with all the tweets retrievedd
        """
        self.screen_name = screen_name
        if total <= 200:
            self.all_tweets = Tweets.__get_tweets(total, screen_name)
        else:
            counter = 200
            self.all_tweets = Tweets.__get_tweets(counter, screen_name)
            oldest_id = self.all_tweets[-1].id
            while len(self.all_tweets) < total:
                total_block_tweets = 200 if total - counter > 200 else total - counter
                tweets = Tweets.__get_tweets(total_block_tweets, screen_name, oldest_id)
                if len(tweets) > 0:
                    self.all_tweets.extend(tweets)
                    oldest_id = self.all_tweets[-1].id
                    counter = len(self.all_tweets)
                else:
                    break

    def get_replies(self, tweet_id):
        api = TwitterConnection().api
        self.replies = tweepy.Cursor(api.search, q='to:{}'.format(self.screen_name), since_id=tweet_id, tweet_mode='extended').items()

    def search_replies_to_tweet(self, tweet_id):
        while True:
            try:
                reply = self.replies.next()
                print(reply.in_reply_to_status_id)
                if reply.in_reply_to_status_id == tweet_id:
                    print("reply of tweet:{}".format(reply.full_text))
                if reply.in_reply_to_status_id_str == str(tweet_id):
            except StopIteration:
                print("El cursor ha llegado a su final!!!")
                break

With this code, you can get all the Tweets from the Twitter account "MovistarEstu":

def main():
    t = Tweets()
    t.get_tweets(200, "MovistarEstu")
    i = 0
    for info in t.all_tweets:
        print(f"i: {i} - ID: {info.id} - created_at: {info.created_at}")
        print(f"text: {info.full_text}\n")
        i += 1

You get all the Tweets and then you print some info about them. All of this works fine. But my problem comes when I try to get all the replies to all the Tweets created by "MovistarEstu" since an ID. I've got some replies but not all.

For example, I've got the replies for the Tweet with ID: 1403443418085265411 but not with ID: 1391368878861824002, and I don't know why :(

With this code, I try to get all the Tweets from "MovistarEstu" since ID: 1391364490286047238

t.get_replies(1391364490286047238)

And now, I try to get all the replies to "MovistarEstu" to this ID Tweet: 1391368878861824002

t.search_replies_to_tweet(1391368878861824002)

But, I don't get anything. However, If you go to Twitter you can check that there are replies: https://twitter.com/MovistarEstu/status/1391368878861824002

If you try to get all the replies for this ID: 1403443418085265411

t.search_replies_to_tweet(1403443418085265411)

Then, I can found the replies!!!

reply of tweet:@MovistarEstu Victoria en el 4 partido de la final 
reply of tweet:@MovistarEstu Momento que no volveremos a ver en la puta vida 
reply of tweet:@MovistarEstu Es buenísimo porque el CM del @MovistarEstu está boicoteando constantemente a su directiva haciéndonos recordar que el pasado fue glorioso y que nos han llevado a la absoluta mediocridad. 
reply of tweet:@MovistarEstu No me habéis pedido permiso para usar la foto 🤔 
reply of tweet:@MovistarEstu Yo estaba ahí con mis compis de cantera 
reply of tweet:@MovistarEstu Que salgan los toreros oh oh oh!!!! reply of tweet:@MovistarEstu Entonces salían los toreros habitualmente, ahora sólo salen los torreznos 
reply of tweet:@MovistarEstu Cualquier tiempo pasado fue mejor. Asensio ya estaba por aquel entonces mamando del frasco? 
reply of tweet:@MovistarEstu Claro, cuando Nacho aprobó la selectividad a la 17a 
reply of tweet:@MovistarEstu 17 años ya!!! Lo recuerdo como si fuera ayer. Se forzó quinto partido de la final ACB con el Farsa. Patterson, Nicola Loncar... 
reply of tweet:@MovistarEstu Segundo partido en Vistalegre de la final de liga contra el FCBarcelona. Tremenda exhibición, ambientazo en las gradas y 2-2. Todo se decidirá en el Palau (cuando ya debía estar finiquitada la final tras algún arbitraje "ejem-ejem" en Barcelona)... 
reply of tweet:@MovistarEstu Pase a la final ACB?

What am I doing wrong?


Solution

  • From the documentation for Twitter's standard search API that Tweepy's API.search uses:

    Keep in mind that the search index has a 7-day limit. In other words, no tweets will be found for a date older than one week.

    https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators also says:

    The Search API is not a complete index of all Tweets, but instead an index of recent Tweets. The index includes between 6-9 days of Tweets.