I scraped tweets using tweepy using code based on first answer of this question, which is as following
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
query = 'kubernetes'
max_tweets = 200
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
break
It provides a list of json objects such as searched_tweets[2]
output (truncated)
Status(_api=<tweepy.api.API object at 0x7fc13dbab828>, _json={'created_at': 'Wed Jun 10 14:06:51 +0000 2020', 'id': 1270719075388280834, 'id_str': '1270719075388280834', 'text': "RT @CDWGWAGov: According to @IBM's new CEO, #hybridcloud & #AI are the two dominant forces driving #digitaltransformation #Kubernetes #IoT…", 'truncated': False,
I need creation date and tweet text so I used following code to extract them
for tweet in searched_tweets:
new_tweet = json.dumps(tweet)
dct = json.loads(new_tweet._json)
created_at=dct['created_at']
txt=dct['text']
but it is giving
TypeError: Object of type 'Status' is not JSON serializable
I have tried this solution to solve this error which is api = tweepy.API(auth, parser=tweepy.parsers.JSONParser())
it give KeyError: -1
I have tried almost every other solution on stackoverflow but nothing worked for me. Can someone help me to unpack json and get those two values? Thank you
The Status
object of tweepy itself is not JSON serializable, but it has a _json
property can be JSON serialized
For example
status_list = api.user_timeline(user_handler)
status = status_list[0]
json_str = json.dumps(status._json)
I suspect the error is caused by this line
new_tweet = json.dumps(tweet)
here, so simply call the _json
property on this line
new_tweet = json.dumps(tweet._json)
and modify the relevant follow on code. This should solve your problem