I created a tweepy listener to collect tweets into a local MongoDB during the first presidential debate but have realized that the tweets I have been collecting are limited to 140 characters and many are being cut off at the 140 character limit. In my stream I had definied tweet_mode='extended'
which I thought would have resolved this issue, however, I am still not able to retrieve the full length of tweets longer than 140 characters. Below is my code:
auth.set_access_token(twitter_credentials.ACCESS_TOKEN, twitter_credentials.ACCESS_TOKEN_SECRET)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
# Create a listener MyListener that streams and stores tweets to a local MongoDB
class MyListener(StreamListener):
def __init__(self):
super().__init__()
self.list_of_tweets = deque([], maxlen=5)
def on_data(self, data):
try:
tweet_text = json.loads(data)
self.list_of_tweets.append(tweet_text)
self.print_list_of_tweets()
db['09292020'].insert_one(tweet_text)
except:
None
def on_error(self, status):
print(status)
def print_list_of_tweets(self):
display.clear_output(wait=True)
for index, tweet_text in enumerate(self.list_of_tweets):
m='{}. {}\n\n'.format(index, tweet_text)
print(m)
debate_stream = Stream(auth, MyListener(), tweet_mode='extended')
debate_stream = debate_stream.filter(track=['insert', 'debate', 'keywords', 'here'])
Any input into how I can obtain the full extended tweet via this listener would be greatly appreciated!
tweet_mode=extended
has no effect on the legacy standard streaming API, as Tweets are delivered in both truncated (140) and extended (280) form by default.
So you'll want your Stream Listener set up like this:
debate_stream = Stream(auth, MyListener())
What you should be seeing is that the JSON object for longer Tweets has a text
field of 140 characters, but contains an additional dictionary called extended_tweet
which in turn contains a full_text
field with the full Tweet text.