Novice programmer here seeking help. I have a list of hashtags for which I want to get all the historical tweets from 01-01-2015 to 31-12-2018.
I tried to use the Tweepy library but it only allows access for the last 7 days of tweets. I also tried to use GetOldTweets as it gives access to historical tweets but it kept continuously crashing. So now I have acquired premium API access for Twitter which also gives me access to the full historic tweets.
In order to do do my query with the premium API I cannot use the Tweepy Library (as it does not have a link with the premium APIs right?) and my choices are between TwitterAPI and Search-Tweets.
1- Does TwitterAPI and Search-Tweets supply information regarding the user name, user location, if the user is verified, the language of the tweet, the source of the tweet, the count of the retweets and favourites and the date for each tweet? (As tweepy does). I could not find any information about this.
2- Can I supply a time span in my query?
3- How do I do all of this?
This was my code for the Tweepy library:
hashtags = ["#AAPL","#FB","#KO","#ABT","#PEPCO",...]
df = pd.DataFrame(columns = ["Hashtag", "Tweets", "User", "User_Followers",
"User_Location", "User_Verified", "User_Lang", "User_Status",
"User_Method", "Fav_Count", "RT_Count", "Tweet_date"])
def tweepy_df(df,tags):
for cash in tags:
i = len(df)+1
for tweet in tweepy.Cursor(api.search, q= cash, since = "2015-01-01", until = "2018-12-31").items():
print(i, end = '\r')
df.loc[i, "Hashtag"] = cash
df.loc[i, "Tweets"] = tweet.text
df.loc[i, "User"] = tweet.user.name
df.loc[i, "User_Followers"] = tweet.followers_count
df.loc[i, "User_Location"] = tweet.user.location
df.loc[i, "User_Verified"] = tweet.user.verified
df.loc[i, "User_Lang"] = tweet.lang
df.loc[i, "User_Status"] = tweet.user.statuses_count
df.loc[i, "User_Method"] = tweet.source
df.loc[i, "Fav_Count"] = tweet.favorite_count
df.loc[i, "RT_Count"] = tweet.retweet_count
df.loc[i, "Tweet_date"] = tweet.created_at
i+=1
return df
How do I adapt this for, for example, the Twitter API Library?
I know that it should be adapted to something like this:
for tweet in api.request('search/tweets', {'q':cash})
But it is still missing the desired timespan. And I'm not sure if the names for the characteristics match the ones for this libraries.
Using TwitterAPI, you can make Premium Search requests this way:
from TwitterAPI import TwitterAPI
SEARCH_TERM = '#AAPL OR #FB OR #KO OR #ABT OR #PEPCO'
PRODUCT = 'fullarchive'
LABEL = 'your label'
api = TwitterAPI('consumer key', 'consumer secret', 'access token key', 'access token secret')
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL), {'query':SEARCH_TERM})
for item in r:
if 'text' in item:
print(item['text'])
print(item['user']['name'])
print(item['followers_count'])
print(item['user']['location'])
print(item['user']['verified'])
print(item['lang'])
print(item['user']['statuses_count'])
print(item['source'])
print(item['favorite_count'])
print(item['retweet_count'])
print(item['created_at'])
The Premium search doc explains the supported request arguments. To do a date range use this:
r = api.request('tweets/search/%s/:%s' % (PRODUCT, LABEL),
{'query':SEARCH_TERM, 'fromDate':201501010000, 'toDate':201812310000})