I just finished the CodeAcademyIBM Watson course, and they programmed in python 2, when I brought the file over in python 3, I kept getting this error. The file script and all the credentials worked fine in CodeAcademy. Is this because I'm working in Python 3, or is it because of an issue in the code.
Traceback (most recent call last):
File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 58, in <module>
user_result = analyze(user_handle)
File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 22, in analyze
text += status.text.encode('utf-8')
TypeError: must be str, not bytes
Does anyone know whats wrong, the code is below:
import sys
import operator
import requests
import json
import twitter
from watson_developer_cloud import PersonalityInsightsV2 as PersonalityInsights
def analyze(handle):
twitter_consumer_key = '<key>'
twitter_consumer_secret = '<secret>'
twitter_access_token = '<token>'
twitter_access_secret = '<secret>'
twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)
statuses = twitter_api.GetUserTimeline(screen_name = handle, count = 200, include_rts = False)
text = ""
for status in statuses:
if (status.lang =='en'): #English tweets
text += status.text.encode('utf-8')
#The IBM Bluemix credentials for Personality Insights!
pi_username = '<username>'
pi_password = '<password>'
personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(text)
return pi_result
def flatten(orig):
data = {}
for c in orig['tree']['children']:
if 'children' in c:
for c2 in c['children']:
if 'children' in c2:
for c3 in c2['children']:
if 'children' in c3:
for c4 in c3['children']:
if (c4['category'] == 'personality'):
data[c4['id']] = c4['percentage']
if 'children' not in c3:
if (c3['category'] == 'personality'):
data[c3['id']] = c3['percentage']
return data
def compare(dict1, dict2):
compared_data = {}
for keys in dict1:
if dict1[keys] != dict2[keys]:
compared_data[keys]=abs(dict1[keys] - dict2[keys])
return compared_data
user_handle = "@itsguppythegod"
celebrity_handle = "@giselleee_____"
user_result = analyze(user_handle)
celebrity_result = analyze(celebrity_handle)
user = flatten(user_result)
celebrity = flatten(celebrity_result)
compared_results = compare(user, celebrity)
sorted_result = sorted(compared_results.items(), key=operator.itemgetter(1))
for keys, value in sorted_result[:5]:
print(keys, end = " ")
print(user[keys], end = " ")
print ('->', end - " ")
print (celebrity[keys], end = " ")
print ('->', end = " ")
print (compared_results[keys])
You created a str
(unicode text) object here:
text = ""
and then proceed to append UTF-8 encoded bytes:
text += status.text.encode('utf-8')
In Python 2, ""
created a bytestring and that was all fine (albeit that you are then posting UTF-8 bytes to a service that will interpret it all as Latin-1, see the API documentation.
To fix this, don't encode the status texts until you are done collecting all the tweets. In addition, tell Watson to expect UTF-8 data. Last but not least, you should really build a list of twitter texts first and concatenate them in one step later on with str.join()
, as concatenating strings in a loop takes quadratic time:
text = []
for status in statuses:
if (status.lang =='en'): #English tweets
text.append(status.text)
# ...
personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(
' '.join(text).encode('utf8'),
content_type='text/plain; charset=utf-8'
)