i'm trying to download twitter followers from a list of accounts. my function (that uses twython) works pretty well for short account lists but rise an error for longer lists. it is not a RateLimit problem since my function sleeps until the next time bin if the rate limit is hit. the error is this
twythonerror: ('Connection aborted.', error(10054, ''))
others seem to have the same problem and the proposed solution is to make the function sleep between different REST API calls so i implemented the following code
del twapi
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
nap is a list of intervals in seconds and afternoon is an index. despite this suggestion i still have the exact same problem. it seems that the sleep doesen't resolve the problem. can anyone help me?
here is the whole finction
def download_follower(serie_lst):
"""Creates account named txt files containing followers ids. Uses for loop on accounts names list."""
nap = [1, 2, 4, 8, 16, 32, 64, 128]
afternoon = 0
for exemplar in serie_lst:
#username from serie_lst entries
account_name = exemplar
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
try:
#initializations
del twapi
if afternoon >= 7:
afternoon =0
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
next_cursor = -1
result = {}
result["screen_name"] = ""
result["followers"] = []
iteration = 0
file_name = ""
#user info
user = twapi.lookup_user(screen_name = account_name)
#store user name
result['screen_name'] = account_name
#loop until all cursored results are stored
while (next_cursor != 0):
sleep(random.randrange(start = 1, stop = 15, step = 1))
call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
#loop over each entry of followers id and append each entry to results_follower
for i in call_result["ids"]:
result["followers"].append(i)
next_cursor = call_result["next_cursor"] #new next_cursor
iteration = iteration + 1
if (iteration > 13): #skip sleep if all cursored pages are processed
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
sleep(901) #15min + 1sec
iteration = 0
#output file
file_name = "".join([account_name, ".txt"])
#print output
out_file = open(file_name, "w") #open file "account_name.txt"
#out_file.write(str(result["followers"])) #standard format
for i in result["followers"]: #R friendly table format
out_file.write(str(i))
out_file.write("\n")
out_file.close()
except twython.TwythonRateLimitError:
#wait
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg ="".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
del twapi
sleep(901) #15min + 1sec
#initializations
if afternoon >= 7:
afternoon =0
sleep(nap[afternoon])
afternoon = afternoon + 1
twapi = Twython(app_key=app_key, app_secret=app_secret,
oauth_token=oauth_token, oauth_token_secret=oauth_token_secret)
next_cursor = -1
result = {}
result["screen_name"] = ""
result["followers"] = []
iteration = 0
file_name = ""
#user info
user = twapi.lookup_user(screen_name = account_name)
#store user name
result['screen_name'] = account_name
#loop until all cursored results are stored
while (next_cursor != 0):
sleep(random.randrange(start = 1, stop = 15, step = 1))
call_result = twapi.get_followers_ids(screen_name = account_name, cursor = next_cursor)
#loop over each entry of followers id and append each entry to results_follower
for i in call_result["ids"]:
result["followers"].append(i)
next_cursor = call_result["next_cursor"] #new next_cursor
iteration = iteration + 1
if (iteration > 13): #skip sleep if all cursored pages are processed
error_msg = localtime()
error_msg = "".join([str(error_msg.tm_mon), "/", str(error_msg.tm_mday), "/", str(error_msg.tm_year), " at ", str(error_msg.tm_hour), ":", str(error_msg.tm_min)])
error_msg = "".join(["Twitter API Request Rate Limit hit on ", error_msg, ", wait..."])
print(error_msg)
del error_msg
sleep(901) #15min + 1sec
iteration = 0
#output file
file_name = "".join([account_name, ".txt"])
#print output
out_file = open(file_name, "w") #open file "account_name.txt"
#out_file.write(str(result["followers"])) #standard format
for i in result["followers"]: #R friendly table format
out_file.write(str(i))
out_file.write("\n")
out_file.close()
As discussed in the comments, there are a few issues with your code at present. You shouldn't need to delete your connection for it to function properly, and I think the issue comes because you initialise for a second time without having any catches for hitting your rate limit. Here is an example using Tweepy
of how you can get the information you require:
import tweepy
from datetime import datetime
def download_followers(user, api):
all_followers = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Skipping...'.format(user))
# Include your keys below:
consumer_key = 'YOUR_KEY'
consumer_secret = 'YOUR_KEY'
access_token = 'YOUR_KEY'
access_token_secret = 'YOUR_KEY'
# Set up tweepy API, with handling of rate limits
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# List of usernames to get followers for
lookup_users = ['asongtoruin', 'mbiella']
for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
with open(username + '.txt', 'w') as outfile:
outfile.write('\n'.join(user_followers))
print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
Tweepy is clever enough to know when it has hit its rate limit when we use wait_on_rate_limit=True
, and checks how long it needs to sleep for before it can start again. By using wait_on_rate_limit_notify=True
, we allow it to paste out how long it will be waiting until it can next get a page of followers (through this ID-based method, it seems as though there are 5000 IDs per page).
We additionally catch a TweepError
exception - this can occur if the username provided relates to a protected account for which our authenticated user does not have permission to view. In this case, we simply skip the user to allow other information to be downloaded, but print out a warning that the user could not be accessed.
Running this saves a text file of follower ids for any user it can access. For me this prints the following:
Rate limit reached. Sleeping for: 593
Finished outputting: asongtoruin at 2017/02/22 11:43:12
Could not access user mbiella. Skipping...
With the follower IDs of asongtoruin
(aka me) saved as asongtoruin.txt
There is one possible issue, in that our pages of followers start from the newest first. This could (though I don't understand the API well enough to say with certainty) result in issues with our output dataset if new users are added between our calls, as we may both miss these users and end up with duplicates in our dataset. If duplicates become an issue, you could change return all_followers
to return set(all_followers)