Search code examples
pythonapitwittertwythontwitter-rest-api

Twitter API connection aborted with Twython


i'm trying to download twitter followers from a list of accounts. my function (that uses twython) works pretty well for short account lists but rise an error for longer lists. it is not a RateLimit problem since my function sleeps until the next time bin if the rate limit is hit. the error is this

twythonerror: ('Connection aborted.', error(10054, ''))

others seem to have the same problem and the proposed solution is to make the function sleep between different REST API calls so i implemented the following code

    del twapi
    sleep(nap[afternoon])
    afternoon = afternoon + 1
    twapi = Twython(app_key=app_key, app_secret=app_secret,
                oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)

nap is a list of intervals in seconds and afternoon is an index. despite this suggestion i still have the exact same problem. it seems that the sleep doesen't resolve the problem. can anyone help me?

here is the whole finction

def download_follower(serie_lst):
    """Creates account named txt files containing followers ids. Uses for loop on accounts names list."""
    nap = [1, 2, 4, 8, 16, 32, 64, 128]    
    afternoon = 0

    for exemplar in serie_lst:

        #username from serie_lst entries
        account_name = exemplar

        twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)

        try:
            #initializations
            del twapi
            if afternoon >= 7:
                afternoon =0

            sleep(nap[afternoon])
            afternoon = afternoon + 1
            twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
            next_cursor = -1
            result = {}
            result["screen_name"] = ""
            result["followers"] = []
            iteration = 0
            file_name = ""

            #user info
            user = twapi.lookup_user(screen_name = account_name)

            #store user name
            result['screen_name'] = account_name

            #loop until all cursored results are stored
            while (next_cursor != 0):
                sleep(random.randrange(start = 1, stop = 15, step = 1))
                call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
                #loop over each entry of followers id and append each     entry to results_follower    
                for i in call_result["ids"]:
                    result["followers"].append(i)
                next_cursor = call_result["next_cursor"] #new next_cursor
                iteration = iteration + 1
                if (iteration > 13): #skip sleep if all cursored pages are processed
                    error_msg = localtime()
                    error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
                    error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
                    print(error_msg)
                    del error_msg
                    sleep(901) #15min + 1sec
                    iteration = 0

            #output file
            file_name = "".join([account_name, ".txt"])

            #print output
            out_file = open(file_name, "w") #open file "account_name.txt"
            #out_file.write(str(result["followers"])) #standard format
            for i in result["followers"]: #R friendly table format
                out_file.write(str(i))
                out_file.write("\n")
            out_file.close()

        except twython.TwythonRateLimitError:
            #wait
            error_msg = localtime()
            error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
            error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
            print(error_msg)
            del error_msg
            del twapi
            sleep(901) #15min + 1sec

            #initializations
            if afternoon >= 7:
                afternoon =0

            sleep(nap[afternoon])
            afternoon = afternoon + 1
            twapi = Twython(app_key=app_key, app_secret=app_secret,
                        oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
            next_cursor = -1
            result = {}
            result["screen_name"] = ""
            result["followers"] = []
            iteration = 0
            file_name = ""

            #user info
            user = twapi.lookup_user(screen_name = account_name)

            #store user name
            result['screen_name'] = account_name

            #loop until all cursored results are stored
            while (next_cursor != 0):
                sleep(random.randrange(start = 1, stop = 15, step = 1))
                call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
                #loop over each entry of followers id and append each entry to results_follower    
                for i in call_result["ids"]:
                    result["followers"].append(i)
                next_cursor = call_result["next_cursor"] #new next_cursor
                iteration = iteration + 1
                if (iteration > 13): #skip sleep if all cursored pages are processed
                    error_msg = localtime()
                    error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
                    error_msg = "".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
                    print(error_msg)
                    del error_msg
                    sleep(901) #15min + 1sec
                    iteration = 0

            #output file
            file_name = "".join([account_name, ".txt"])

            #print output
            out_file = open(file_name, "w") #open file "account_name.txt"
            #out_file.write(str(result["followers"])) #standard format
            for i in result["followers"]: #R friendly table format
                out_file.write(str(i))
                out_file.write("\n")
            out_file.close()

Solution

  • As discussed in the comments, there are a few issues with your code at present. You shouldn't need to delete your connection for it to function properly, and I think the issue comes because you initialise for a second time without having any catches for hitting your rate limit. Here is an example using Tweepy of how you can get the information you require:

    import tweepy
    from datetime import datetime
    
    
    def download_followers(user, api):
        all_followers = []
        try:
            for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
                all_followers.extend(map(str, page))
            return all_followers
        except tweepy.TweepError:
            print('Could not access user {}. Skipping...'.format(user))
    
    # Include your keys below:
    consumer_key = 'YOUR_KEY'
    consumer_secret = 'YOUR_KEY'
    access_token = 'YOUR_KEY'
    access_token_secret = 'YOUR_KEY'
    
    # Set up tweepy API, with handling of rate limits
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
    
    # List of usernames to get followers for
    lookup_users = ['asongtoruin', 'mbiella']
    
    for username in lookup_users:
        user_followers = download_followers(username, main_api)
        if user_followers:
            with open(username + '.txt', 'w') as outfile:
                outfile.write('\n'.join(user_followers))
            print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
    

    Tweepy is clever enough to know when it has hit its rate limit when we use wait_on_rate_limit=True, and checks how long it needs to sleep for before it can start again. By using wait_on_rate_limit_notify=True, we allow it to paste out how long it will be waiting until it can next get a page of followers (through this ID-based method, it seems as though there are 5000 IDs per page).

    We additionally catch a TweepError exception - this can occur if the username provided relates to a protected account for which our authenticated user does not have permission to view. In this case, we simply skip the user to allow other information to be downloaded, but print out a warning that the user could not be accessed.

    Running this saves a text file of follower ids for any user it can access. For me this prints the following:

    Rate limit reached. Sleeping for: 593
    Finished outputting: asongtoruin at 2017/02/22 11:43:12
    Could not access user mbiella. Skipping...
    

    With the follower IDs of asongtoruin (aka me) saved as asongtoruin.txt

    There is one possible issue, in that our pages of followers start from the newest first. This could (though I don't understand the API well enough to say with certainty) result in issues with our output dataset if new users are added between our calls, as we may both miss these users and end up with duplicates in our dataset. If duplicates become an issue, you could change return all_followers to return set(all_followers)