Search code examples
pythoncsvfor-looptwitterpandas

Looping through names on a list to run through another program in Python


I've been trying to run a list of twitter user names through a Python script that downloads their tweet history from Twitter's API. I have the user names as a csv file that I tried to import into a list and then pass through the script one-by-one using a for-loop. However, I am getting this error as it seems to be dumping the entire list into the script at once:

<ipython-input-24-d7d2e882d84c> in get_all_tweets(screen_name)
     60 
     61         #write the csv
---> 62         with open('%s_tweets.csv' % screen_name, 'wb') as f:
     63                 writer = csv.writer(f)
     64                 writer.writerow(["id","created_at","text"])

IOError: [Errno 36] File name too long: '0       TonyAbbottMHR\n1              AlboMP\n2     JohnAlexanderMP\n3      karenandrewsmp\n4

For brevity's sake, I am just including a list in the code, with the importing of names from the csv to the list commented out.

Apologies, but in order to run the script, one needs a Twitter API. My code is below:

#!/usr/bin/env python
# encoding: utf-8

import tweepy #https://github.com/tweepy/tweepy
import csv
import os
import pandas as pd

#Twitter API credentials
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

os.chdir('file/dir/path')

mps = [TonyAbbottMHR,AlboMP,JohnAlexanderMP,karenandrewsmp]
#df = pd.read_csv('twitMP.csv')

#for row in df:
    #mps.append(df.AccName)   

def get_all_tweets(screen_name):
    #Twitter only allows access to a users most recent 3240 tweets with this method

    #authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    #initialize a list to hold all the tweepy Tweets
    alltweets = []  

    #make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name = screen_name,count=200)

    #save most recent tweets
    alltweets.extend(new_tweets)

    #save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    #keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        print "getting tweets before %s" % (oldest)

        #all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)

        #save most recent tweets
        alltweets.extend(new_tweets)

        #update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        print "...%s tweets downloaded so far" % (len(alltweets))

    #transform the tweepy tweets into a 2D array that will populate the csv 
    outtweets = [[tweet.id_str, tweet.created_at, tweet.text.encode("utf-8")] for tweet in alltweets]

    #write the csv  
    with open('%s_tweets.csv' % screen_name, 'wb') as f:
        writer = csv.writer(f)
        writer.writerow(["id","created_at","text"])
        writer.writerows(outtweets)

    pass

if __name__ == '__main__':
    #pass in the username of the account you want to download
    for i in range(len(mps)):
        get_all_tweets(mps[i])

Solution

  • It seems that this

    #df = pd.read_csv('twitMP.csv')
    
    #for row in df:
        #mps.append(df.AccName) 
    

    portion of your code is giving you troubles.

    Here are your problem(s)

    Problem 1

    When ones iterates over a DataFrame object, you will actually iterate over it's column names, so you do not want to do that. You can see this by running list(df) which returns a list of the column names.

    Problem 2

    When you append df.AccName you are in fact appending the entire column. So in the end, mps becomes a list of DataFrame columns, each element identical and equal to df.AccName.

    Solution

    All you need to do is

    df = pd.read_csv('twitMP.csv')
    mps = df.AccName.tolist() #or df.AccName.astype(str).tolist() if they aren't strings, but they should be
    

    Bonus

    When you loop over mps, try using enumerate, you get two variables, and the code is cleaner in my opinion

    for i,name in enumerate( mps):
        get_all_tweets( name ) 
    

    You can still use the index of name (i) however you like within each iteration, .