Search code examples
pythonpython-3.xencodingdecodingpython-3.6

Python 3.6 - Read encoded text from file and convert to string


Hopefully someone can help me out with the following. It is probably not too complicated but I haven't been able to figure it out. My "output.txt" file is created with:

f = open('output.txt', 'w')
print(tweet['text'].encode('utf-8'))
print(tweet['created_at'][0:19].encode('utf-8'))
print(tweet['user']['name'].encode('utf-8')) 
f.close()

If I don't encode it for writing to file, it will give me errors. So "output" contains 3 rows of utf-8 encoded output:

b'testtesttest'
b'line2test'
b'\xca\x83\xc9\x94n ke\xc9\xaan'

In "main.py", I am trying to convert this back to a string:

f = open("output.txt", "r", encoding="utf-8")
text = f.read()
print(text)
f.close()

Unfortunately, the b'' - format is still not removed. Do I still need to decode it? If possible, I would like to keep the 3 row structure. My apologies for the newbie question, this is my first one on SO :)

Thank you so much in advance!


Solution

  • With the help of the people answering my question, I have been able to get it to work. The solution is to change the way how to write to file:

         tweet = json.loads(data)
         tweet_text = tweet['text'] #  content of the tweet
         tweet_created_at = tweet['created_at'][0:19] #  tweet created at
         tweet_user = tweet['user']['name']  # tweet created by
         with open('output.txt', 'w', encoding='utf-8') as f:
               f.write(tweet_text + '\n')
               f.write(tweet_created_at+ '\n')
               f.write(tweet_user+ '\n')
    

    Then read it like:

        f = open("output.txt", "r", encoding='utf-8')
        tweettext = f.read()
        print(text)
        f.close()