Search code examples
pythonapipandastwitter

How to create pandas dataframe from Twitter Search API?


I am working with the Twitter Search API which returns a dictionary of dictionaries. My goal is to create a dataframe from a list of keys in the response dictionary.

Example of API response here: Example Response

I have a list of keys within the Statuses dictionary

keys = ["created_at", "text", "in_reply_to_screen_name", "source"]

I would like to loop through each key value returned in the Statuses dictionary and put them in a dataframe with the keys as the columns.

Currently have code to loop through a single key individually and assign to list then append to dataframe but want a way to do more than one key at a time. Current code below:

#w is the word to be queired
w = 'keyword'
#count of tweets to return
count = 1000

#API call
query = twitter.search.tweets(q= w, count = count)

def data_l2 (q, k1, k2):

    data = []

    for results in q[k1]:
        data.append(results[k2])

    return(data)

screen_names = data_l3(query, "statuses", "user", "screen_name")

data = {'screen_names':screen_names,
       'tweets':tweets}
frame=pd.DataFrame(data)
frame

Solution

  • I will share a more generic solution that I came up with, as I was working with the Twitter API. Let's say you have the ID's of tweets that you want to fetch in a list called my_ids :

    # Fetch tweets from the twitter API using the following loop:
    list_of_tweets = []
    # Tweets that can't be found are saved in the list below:
    cant_find_tweets_for_those_ids = []
    for each_id in my_ids:   
        try:
            list_of_tweets.append(api.get_status(each_id))
        except Exception as e:
            cant_find_tweets_for_those_ids.append(each_id)
    

    Then in this code block we isolate the json part of each tweepy status object that we have downloaded and we add them all into a list....

    my_list_of_dicts = []
    for each_json_tweet in list_of_tweets:
        my_list_of_dicts.append(each_json_tweet._json)
    

    ...and we write this list into a txt file:

    with open('tweet_json.txt', 'w') as file:
            file.write(json.dumps(my_list_of_dicts, indent=4))
    

    Now we are going to create a DataFrame from the tweet_json.txt file (I have added some keys that were relevant to my use case that I was working on, but you can add your specific keys instead):

    my_demo_list = []
    with open('tweet_json.txt', encoding='utf-8') as json_file:  
        all_data = json.load(json_file)
        for each_dictionary in all_data:
            tweet_id = each_dictionary['id']
            whole_tweet = each_dictionary['text']
            only_url = whole_tweet[whole_tweet.find('https'):]
            favorite_count = each_dictionary['favorite_count']
            retweet_count = each_dictionary['retweet_count']
            created_at = each_dictionary['created_at']
            whole_source = each_dictionary['source']
            only_device = whole_source[whole_source.find('rel="nofollow">') + 15:-4]
            source = only_device
            retweeted_status = each_dictionary['retweeted_status'] = each_dictionary.get('retweeted_status', 'Original tweet')
            if retweeted_status == 'Original tweet':
                url = only_url
            else:
                retweeted_status = 'This is a retweet'
                url = 'This is a retweet'
    
            my_demo_list.append({'tweet_id': str(tweet_id),
                                 'favorite_count': int(favorite_count),
                                 'retweet_count': int(retweet_count),
                                 'url': url,
                                 'created_at': created_at,
                                 'source': source,
                                 'retweeted_status': retweeted_status,
                                })
            tweet_json = pd.DataFrame(my_demo_list, columns = ['tweet_id', 'favorite_count', 
                                                           'retweet_count', 'created_at',
                                                           'source', 'retweeted_status', 'url'])