Search code examples
twittertweepytwython

Twitter Streaming API to follow thousands of users


I'm considering using the Twitter Streaming API (public streams) to keep track of the latest tweets for many users (up to 100k). Despite having read various sources regarding the different rate limits, I still have couple of questions:

  • According to the documentation: The default access level allows up to 400 track keywords, 5,000 follow userids. What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?

  • If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?

  • Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?


Solution

  • What are the best practices to follow more the 5k users. Creating, for example, 20 applications to get 20 different access tokens?

    You don't want to use multiple applications. This response from a mod sums up the situation well. The Twitter Streaming API documentation also specifically calls out devs who attempt to do this:

    Each account may create only one standing connection to the public endpoints, and connecting to a public stream more than once with the same account credentials will cause the oldest connection to be disconnected.

    Clients which make excessive connection attempts (both successful and unsuccessful) run the risk of having their IP automatically banned.

    A rate limit is a rate limit--you can't get more than Twitter allows.

    If I follow just one single user, does the rule of thumb "You get about 1% of all tweets" indeed apply? And how does this changes if I add more users up to 5k?

    The 1% rule still applies, but it is very unlikely impossible for one user to be responsible for at least 1% of all tweet volume in a given time interval. More users means more tweets, but unless all 5k are very high-volume tweet-ers you shouldn't have a problem.

    Might using the REST API be a reasonable alternative somehow, e.g., by polling the latest tweets of users on a minute-by-minute basis?

    Interesting idea, but probably not. You're also rate-limited in the Search API. For GET/statuses/user_timeline, the rate limit is 180 queries per 15 minutes. You can only get the tweets for one user with this endpoint, and the regular GET/search/tweets doesn't accept user id as a parameter, so you can't take advantage of that (also 180 query/15 min rate limited).

    The Twitter Streaming and REST API overviews are excellent and merit a thorough reading. Tweepy unfortunately has spotty documentation and Twython isn't too much better, but they both leverage the Twitter APIs directly so this will give you a good understanding of how everything works. Good luck!