Search code examples
pythonapidataframetwittertwitterapi-python

Twitter API: How to search tweets based on query words and predetermined time span + tweets characteristics


Novice programmer here seeking help. I have a list of hashtags for which I want to get all the historical tweets from 01-01-2015 to 31-12-2018.

I tried to use the Tweepy library but it only allows access for the last 7 days of tweets. I also tried to use GetOldTweets as it gives access to historical tweets but it kept continuously crashing. So now I have acquired premium API access for Twitter which also gives me access to the full historic tweets.

In order to do do my query with the premium API I cannot use the Tweepy Library (as it does not have a link with the premium APIs right?) and my choices are between TwitterAPI and Search-Tweets.

1- Does TwitterAPI and Search-Tweets supply information regarding the user name, user location, if the user is verified, the language of the tweet, the source of the tweet, the count of the retweets and favourites and the date for each tweet? (As tweepy does). I could not find any information about this.

2- Can I supply a time span in my query?

3- How do I do all of this?

This was my code for the Tweepy library:

hashtags = ["#AAPL","#FB","#KO","#ABT","#PEPCO",...]

df = pd.DataFrame(columns = ["Hashtag", "Tweets", "User", "User_Followers",
"User_Location", "User_Verified", "User_Lang", "User_Status", 
"User_Method", "Fav_Count", "RT_Count", "Tweet_date"])

def tweepy_df(df,tags):
    for cash in tags:
        i = len(df)+1
        for tweet in tweepy.Cursor(api.search, q= cash, since = "2015-01-01", until = "2018-12-31").items():
            print(i, end = '\r')
            df.loc[i, "Hashtag"] = cash
            df.loc[i, "Tweets"] = tweet.text
            df.loc[i, "User"] = tweet.user.name
            df.loc[i, "User_Followers"] = tweet.followers_count
            df.loc[i, "User_Location"] = tweet.user.location
            df.loc[i, "User_Verified"] = tweet.user.verified
            df.loc[i, "User_Lang"] = tweet.lang
            df.loc[i, "User_Status"] = tweet.user.statuses_count
            df.loc[i, "User_Method"] = tweet.source
            df.loc[i, "Fav_Count"] = tweet.favorite_count
            df.loc[i, "RT_Count"] = tweet.retweet_count
            df.loc[i, "Tweet_date"] = tweet.created_at
            i+=1
    return df

How do I adapt this for, for example, the Twitter API Library?

I know that it should be adapted to something like this:

for tweet in api.request('search/tweets', {'q':cash})

But it is still missing the desired timespan. And I'm not sure if the names for the characteristics match the ones for this libraries.


Solution

  • Using TwitterAPI, you can make Premium Search requests this way:

    from TwitterAPI import TwitterAPI
    SEARCH_TERM = '#AAPL OR #FB OR #KO OR #ABT OR #PEPCO'
    PRODUCT = 'fullarchive'
    LABEL = 'your label'
    api = TwitterAPI('consumer key', 'consumer secret', 'access token key', 'access token secret')
    r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL), {'query':SEARCH_TERM})
    for item in r:
        if 'text' in item:
            print(item['text'])
            print(item['user']['name'])
            print(item['followers_count'])
            print(item['user']['location'])
            print(item['user']['verified'])
            print(item['lang'])
            print(item['user']['statuses_count'])
            print(item['source'])
            print(item['favorite_count'])
            print(item['retweet_count'])
            print(item['created_at'])
    

    The Premium search doc explains the supported request arguments. To do a date range use this:

    r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL), 
                    {'query':SEARCH_TERM, 'fromDate':201501010000, 'toDate':201812310000})