I have written the following script that fetches friends of Twitter users ("barackobama" in this example) in batches of 75,000 (5000 friends per API call x 15 API calls) every 15 minutes using rtweet
. However, after the script is done running, I find that the friend ids repeat after a fixed interval. For instance, rows 1, 280001, and 560001 have the same ID. Rows 2, 280002, and 560002 have the same ID, and so on. I'm wondering if I'm understanding next_cursor
in the API incorrectly.
u = "barackobama"
n_friends = lookup_users(u)$friends_count
curr_page = -1
fetched_friends = 0
i = 0
all_friends = NULL
while(fetched_friends < n_friends) {
if(rate_limit("get_friends")$remaining == 0) {
print(paste0("API limit reached. Reseting at ", rate_limit("get_friends")$reset_at))
Sys.sleep(as.numeric((rate_limit("get_friends")$reset + 0.1) * 60))
}
curr_friends = get_friends(u, n = 5000, retryonratelimit = TRUE, page = curr_page)
i = i + 1
all_friends = rbind(all_friends, curr_friends)
fetched_friends = nrow(all_friends)
print(paste0(i, ". ", fetched_friends, " out of ", n_friends, " fetched."))
curr_page = next_cursor(curr_friends)
}
Any help will be appreciated.
You are not doing anything wrong. From the documentation:
this ordering is subject to unannounced change and eventual consistency issues
For very large lists, the API simply won't return all the information you want.