apikey = '2238c8h8E25gSVU1WW28ti7fS7'
apisecretkey = 'ssLG9s4rt4QwLo6PFyMSpLVRT1IoQ3f1EwrrgzTg6TRJLUTeI5e'
accesstoken = '33347844103627698178-3HuOoCCFuMWHwLTmhswKUtJSvG22et'
accesstokensecret = '2s8tAcatrjTHgh81Oo7dw6rvWGGRFZoSrPDa5eInY22Q3c'
auth = tw.OAuthHandler(apikey,apisecretkey) #calling OAuthHandler required for authantication with Twitter
auth.set_access_token(accesstoken,accesstokensecret)
api = tw.API(auth,wait_on_rate_limit=True)
search_word = '#IndvsAus' or '#AusvsInd'
date_since = '2021-01-10'
date_until = '2021-01-11'
tweets = tw.Cursor(api.search,q = search_word+' -filter:retweets',\
lang ='en',tweet_mode='extended',since='date_since',until='date_until').items(100)
tweet_details = [[tweet.id,tweet.source,tweet.full_text,tweet.user.location,tweet.user.created_at,tweet.user.verified,tweet.created_at]for tweet in tweets]
import pandas as pd
tweet100_df = pd.DataFrame(data = tweet_details,columns=['tweet_id','source','Full_text','User_location','User_created_at','User_verified','tweet_timestamp',])
pd.set_option('max_colwidth',800)
tweet100_df.head(20)
Output: tweet_id source Full_text User_location User_created_at User_verified tweet_timestamp
The output is showing no tweets only columns headings. Where am I going wrong?
In order to dump the output of Tweepy's Cursor API into a Pandas DataFrame you need to pass pd.DataFrame
a list of dictionaries and the fields you'd be interested in as column names.
Tweepy has methods for structuring the data from the Cursor items()
method into a dictionary.
In your case:
tweets = tw.Cursor(api.search,q = search_word+' -filter:retweets',\
lang ='en',tweet_mode='extended',since='date_since',until='date_until').items(100)
list_of_dicts = []
for each_json_tweet in tweets:
list_of_dicts.append(tweets._json)
And then you can do:
tweet100_df = pd.DataFrame(data=list_of_dicts,columns=['tweet_id','source','Full_text','User_location','User_created_at','User_verified','tweet_timestamp'])