Search code examples
pythonpandasurltwitterextract

Writing url extracted from tweets in a csv file


I am trying to use this code for extracting urls from tweets. It perfectly works, and give me as output the full urls. I would like to write all of these url in a csv file, I don't care if it is the same of the tweets (it would be better) or another one. I tried different things like ".to_csv" or writerow function but they didnt work, perhaps because I put them in the wrong place. Any help is appreciated!

def get_tweets(handle):
      try:
        auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
        auth.set_access_token(access_token, access_token_secret)
        api = tweepy.API(auth)

    
        number_of_tweets = 200

        tweets = api.user_timeline(screen_name = handle,count = number_of_tweets)

        print(handle, "Number of tweets extracted: {}\n".format(len(tweets)))
        df = pd.DataFrame(data=[tweet.user.screen_name for tweet in tweets], columns=['ScreenName'])

        for tweet in tweets:
            urls = re.findall("http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+", tweet.text)
            for url in urls:
                try:
                    opener = urllib.request.build_opener()
                    request = urllib.request.Request(url)
                    response = opener.open(request)
                    actual_url = response.geturl()
                    print (actual_url)
                except:
                    print(url)
      except:
        pass
      return df



handles = ["name of the user"]
           
         
for handle in handles:
    df_new = get_tweets(handle)

Solution

  • You could try this:

    handles = ["name of the user"]
    
    for handle in handles:
        df_new = get_tweets(handle)
        df_new.to_csv(path_or_buf=f"{handle}_tweets.csv", index=False)