Search code examples
python-3.xpandastwittertweepysentiment-analysis

Unable to retrieve tweets from tweepy (No error/columns with no result in output)


apikey = '2238c8h8E25gSVU1WW28ti7fS7'
apisecretkey = 'ssLG9s4rt4QwLo6PFyMSpLVRT1IoQ3f1EwrrgzTg6TRJLUTeI5e'
accesstoken = '33347844103627698178-3HuOoCCFuMWHwLTmhswKUtJSvG22et'
accesstokensecret = '2s8tAcatrjTHgh81Oo7dw6rvWGGRFZoSrPDa5eInY22Q3c'

auth = tw.OAuthHandler(apikey,apisecretkey) #calling OAuthHandler required for authantication with Twitter
auth.set_access_token(accesstoken,accesstokensecret)

api = tw.API(auth,wait_on_rate_limit=True)

search_word = '#IndvsAus' or '#AusvsInd'
date_since = '2021-01-10'
date_until = '2021-01-11'

tweets = tw.Cursor(api.search,q = search_word+' -filter:retweets',\
                   lang ='en',tweet_mode='extended',since='date_since',until='date_until').items(100)

tweet_details = [[tweet.id,tweet.source,tweet.full_text,tweet.user.location,tweet.user.created_at,tweet.user.verified,tweet.created_at]for tweet in tweets]

import pandas as pd
tweet100_df = pd.DataFrame(data = tweet_details,columns=['tweet_id','source','Full_text','User_location','User_created_at','User_verified','tweet_timestamp',])
pd.set_option('max_colwidth',800)
tweet100_df.head(20)

Output: tweet_id source Full_text User_location User_created_at User_verified tweet_timestamp

The output is showing no tweets only columns headings. Where am I going wrong?


Solution

  • In order to dump the output of Tweepy's Cursor API into a Pandas DataFrame you need to pass pd.DataFrame a list of dictionaries and the fields you'd be interested in as column names.

    Tweepy has methods for structuring the data from the Cursor items() method into a dictionary.

    In your case:

    tweets = tw.Cursor(api.search,q = search_word+' -filter:retweets',\
                       lang ='en',tweet_mode='extended',since='date_since',until='date_until').items(100)
    
    
    list_of_dicts = []
    for each_json_tweet in tweets:
        list_of_dicts.append(tweets._json)
    

    And then you can do:

    tweet100_df = pd.DataFrame(data=list_of_dicts,columns=['tweet_id','source','Full_text','User_location','User_created_at','User_verified','tweet_timestamp'])