I am creating a java application that interact with Twitter using the library Twitter4J. I want to download 10 000 nodes from twitter, and then do the statistics on the graph created. The graph is initially saved in a dataset (.txt file). I must also save enough ReTweet for each node (so I have to check their timeline).
Skipping the instantiation of twitter to perform queries, I have two problems and doubts: 1) How do I manage the problem that the Twitter API have a limited number of required into a slot 15 minutes?
I tried this:
public static RateLimitStatus getApplicationRateLimitStatus(Twitter twitter)
{
try {
Map<String ,RateLimitStatus> rateLimitStatus = twitter.getRateLimitStatus();
for (String endpoint : rateLimitStatus.keySet())
{
if (endpoint.equals("/application/rate_limit_status"))
return rateLimitStatus.get(endpoint);
}
} catch (TwitterException te)
{
te.printStackTrace();
System.out.println("Failed to get rate limit status: " + te.getMessage());
System.exit(-1);
}
return null;
}
public static void control(Twitter twitter)
{
RateLimitStatus app_rate_limit_st = null;
RateLimitStatus user_timeline_limit_st = null;
RateLimitStatus credentials_limit_st = null;
RateLimitStatus friends_list_limit_st = null;
RateLimitStatus followers_list_limit_st = null;
int ctr_req = 7;
try {
if ((app_rate_limit_st = MyRateLimit.getApplicationRateLimitStatus(twitter)).getRemaining() < ctr_req)
{
System.out.println("I'm waiting "+app_rate_limit_st.getSecondsUntilReset()+" seconds for Application Rate Limit, app request remaining: "+app_rate_limit_st.getRemaining());
Thread.sleep((long)app_rate_limit_st.getSecondsUntilReset()*1000);
System.out.println("I woke up!!!");
}
}catch(InterruptedException e) {e.printStackTrace();}
}
In this block of code I checked only requests ApplicationRequest type, but in my application I checked also required type FriendList, FollowersList, UserTimeline and Credentials.
By running my application, is raised that notification I have exceeded the number of available applications, and can not understand why.
2) Another problem is with which algorithm should proceed to download the nodes. I thought about taking a popular node (has many friends and followers, and who interact much with each other). I tried to take the node, on the friends, his followers, and then the friends and followers of the friends and the friends and followers of followers. It's a clever technique? They would know better?
Thanks.
Just a thought on how you can workaround the rate limit problem - Create multiple twitter oauth credentials, you can maintain a list/set of Twitter instance configured with each available credential, when you reach a rate limit for say id1, you can switch to use id2 to fetch records.
Instead of using getApplicationRateLimitStatus, check for functional rate limit status and make a switch, this would help you plan the switch based on available limit for that API.
--Adding code/comments below as per the review comments,
You can do something like below, for each request you can use a connector, in your case you may need to cache few information that can be used to make next call like sinceId and maxId.
You would need to create/register multiple twitter accounts and generate credentials for each of them. I had tried this approach to fasten up fetching information for about 1 M users and it was effective.
You can also cache some recurring information and save few hits to Twitter, like in a network of 10 people its possible to have some percentage of common users/followers so to lookup user information a previously fetched user can be skipped in next request.
getTweetConnector() method will ensure you get a connector that has reset its ratelimit.
Since you are fetching information through multiple API's you can batch up connectors for specific request such that API's with higher rateLimit can have more connectors.
public class TweetUserInfo {
private Set<Twitter> mTwitterConnectorsSet;
private BufferedReader mUserFileReader;
TweetUserInfo(){
mTwitterConnectorsSet = new HashSet<Twitter>();
}
private void initTweetConnectors(String inFile) {
BufferedReader br = null;
try {
String line = null;
String[] lines = new String[4];
int linesIndex = 0;
br = new BufferedReader(new FileReader(inFile));
while ((line = br.readLine()) != null) {
if (linesIndex == 4) {
createAndAddTwitterConnector(lines);
linesIndex = 0;
}
lines[linesIndex] = line;
++linesIndex;
}
if (linesIndex == 4) {
createAndAddTwitterConnector(lines);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
private void createAndAddTwitterConnector(String[] lines) {
ConfigurationBuilder twitterConfigBuilder = new ConfigurationBuilder();
twitterConfigBuilder.setDebugEnabled(true);
for (int i = 0; i < lines.length; ++i) {
String[] input = lines[i].split("=");
if (input[0].equalsIgnoreCase("consumerkey")) {
twitterConfigBuilder.setOAuthConsumerKey(input[1]);
}
if (input[0].equalsIgnoreCase("consumersecret")) {
twitterConfigBuilder.setOAuthConsumerSecret(input[1]);
}
if (input[0].equalsIgnoreCase("accesstoken")) {
twitterConfigBuilder.setOAuthAccessToken(input[1]);
}
if (input[0].equalsIgnoreCase("accesstokensecret")) {
twitterConfigBuilder.setOAuthAccessTokenSecret(input[1]);
}
}
Twitter twitter = new TwitterFactory(twitterConfigBuilder.build()).getInstance();
mTwitterConnectorsSet.add(twitter);
}
private Twitter getTweetConnector() {
for (Twitter tc : mTwitterConnectorsSet) {
try {
if (tc.getRateLimitStatus() != null) {
if (tc.getRateLimitStatus().containsKey("/users/lookup")) {
if (tc.getRateLimitStatus().get("/users/lookup") != null) {
System.out.println("tc - "+tc);
System.out.println("tc rate - "+tc.getRateLimitStatus().get("/users/lookup").getRemaining());
if (tc.getRateLimitStatus().get("/users/lookup").getRemaining() > 2) {
return tc;
}
}
}
}
} catch (TwitterException e) {
e.printStackTrace();
}
}
return null;
}
}
Hope this helps.