So what I want to do is live stream Tweets from Twitters API: for just the hashtag 'Brexit', only in the English language, and for a specific amount of Tweets (1k - 2k).
So far my code will live stream the Tweets, but whichever way I modify it I either end up with it ignoring the count and just streaming indefinitely, or I get errors. If I change it to only stream a specific users Tweets the count function works, but it ignores the hashtag. If I stream everything for the given hashtag it completely ignores the count. I've had a decent go at trying to fix it but am quite inexperienced and have really hit a brick wall with it.
If I could get some help with how to tick all these boxes at the same time would be much appreciated! The code below so far will just stream 'Brexit' Tweets indefinitely so ignores the count=10
The bottom of the code is a bit of a mess due to me playing with it, apologies:
import numpy as np
import pandas as pd
import tweepy
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import Twitter_Credentials
import matplotlib.pyplot as plt
# Twitter client - hash out to stream all
class TwitterClient:
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_twitter_client_api(self):
return self.twitter_client
# Twitter authenticator
class TwitterAuthenticator:
def authenticate_twitter_app(self):
auth = OAuthHandler(Twitter_Credentials.consumer_key, Twitter_Credentials.consumer_secret)
auth.set_access_token(Twitter_Credentials.access_token, Twitter_Credentials.access_secret)
return auth
class TwitterStreamer():
# Class for streaming and processing live Tweets
def __init__(self):
self.twitter_authenticator = TwitterAuthenticator()
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# this handles Twitter authentication and connection to Twitter API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_authenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
# This line filters Twitter stream to capture data by keywords
stream.filter(track=hash_tag_list)
# Twitter stream listener
class TwitterListener(StreamListener):
# This is a listener class that prints incoming Tweets to stdout
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename
def on_data(self, data):
try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
if status == 420:
# Return false on data in case rate limit occurs
return False
print(status)
class TweetAnalyzer():
# Functionality for analysing and categorising content from tweets
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])
df['id'] = np.array([tweet.id for tweet in tweets])
df['len'] = np.array([len(tweet.text) for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['source'] = np.array([tweet.source for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])
return df
if __name__ == "__main__":
auth = OAuthHandler(Twitter_Credentials.consumer_key, Twitter_Credentials.consumer_secret)
auth.set_access_token(Twitter_Credentials.access_token, Twitter_Credentials.access_secret)
api = tweepy.API(auth)
for tweet in Cursor(api.search, q="#brexit", count=10,
lang="en",
since="2019-04-03").items():
fetched_tweets_filename = "tweets.json"
twitter_streamer = TwitterStreamer()
hash_tag_list = ["Brexit"]
twitter_streamer.stream_tweets(fetched_tweets_filename, hash_tag_list)
You're trying to use two different methods of accessing the Twitter API - Streaming is realtime, and searching is a one-off API call.
Since streaming is continuous and realtime, there's no way to apply a count of results to it - the code simply opens a connection, says "hey, send me all the Tweets from now onwards that contain the hash_tag_list
", and sits listening. At that point you then drop into the StreamListener
, where for each Tweet received, you write them into a file.
You could apply a counter here, but you'd need to wrap it inside your StreamListener
on_data
handler, and increment the counter for each Tweet received. When you get to 1000 Tweets, stop listening.
For the search option, you have a couple of issues... the first one is that you're asking for Tweets since 2019, but the standard search API can only go back 7 days in time. You've obviously asked for only 10 Tweets there. The way you've written the method though, what's actually happening is that for each Tweet in the collection of 10 that the API returns, you then create a realtime streaming connection and start listening and writing to a file. So that's not going to work.
You'll need to choose one - either search for 1000 Tweets and write them to a file (never set up TwitterStreamer()
), or, listen for 1000 Tweets and write them to a file (drop the for Tweet in Cursor(api.search...
and jump straight to the streamer).