Search code examples
tweepytwitterapi-python

How to analyse tweets within date constraints?


So I can already get tweets with a certain keyword. But I need to analyse tweets from a specific year.

# Authentication(access to twitter api)
consumerKey = 'aaaaaaaaaaaaaaaaaaaaaaa'
consumerSecret = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
accessToken = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
accessTokenSecret = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(auth)

keyword = input('Please enter keyword or hashtag to search: ')
noOfTweet = int(input ('Please enter how many tweets to analyze: '))
startDate = datetime.datetime(2010, 1, 1, 0, 0, 0)
endDate =   datetime.datetime(2010, 12, 31, 0, 0, 0)

tweets = tweepy.Cursor(api.search, q=keyword).items(noOfTweet)

And given the twitter developer api limit of 500k tweets per month, getting all the tweets with that keyword from the present up until the year in question (in this case it's 2010) to then filter them with the code below is impossible

for tweet in tweets:
  if (not tweet.created_at < endDate) or (not tweet.created_at > startDate):
    continue
  tweet_list.append(tweet.text)

because the api.search seems to always start at the present and go backwards, meaning I exaust the 500k before even getting to tweets from 2015 (this is me guessing I haven't actually tried wasting the entire 500k XD). There's also a comment on the second answer here tweepy get tweets between two dates saying there's still an until parameter working but I couldn't get it working when trying to tweepy.Cursor(api.search, q=keyword, until="2000-12-31").items(noOfTweet)


Solution

  • After searching long and hard and even trying other methods like doing get requests through python, I seem to have finally found the solution being to use api.search_full_archive as opposed to just api.search. So if you're in the same situation I was in, just

    replace tweets = tweepy.Cursor(api.search, q=keyword).items(noOfTweet)

    with tweets = tweepy.Cursor(api.search_full_archive, environment_name=envtag, query=keyword, fromDate="YYYYMMDDHHmm", toDate="YYYYMMDDHHmm").items(noOfTweet)

    where envtag is a string you can get by clicking the Full Archive's "Set up dev environment" button in your developer account and copying registering a "Dev environment label" I hadn't tried search_full_archive yet because it's supposedly premium, but I haven't paid a dime and it works for.

    Also the until parameter on the normal api.search will return nothing if you choose a date older than 7 days ago, I also lost quite a bit of time trying to get that working