Search code examples
pythonutf8-decode

string convresion from input file


i'm new to python and i need some hand to work this code:

this code works right, it converts strings as i need.

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

reshaped_text = arabic_reshaper.reshape(u' الحركات')
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8') # This is ok

I get the following error when i try to read the string from a file:

# -*- coding: utf-8 -*-
import sys
import arabic_reshaper
from bidi.algorithm import get_display

with open ("/home/nemo/Downloads/mpcabd-python-arabic-reshaper-552f3f4/data.txt" , "r") as myfile:
data=myfile.read().replace('\n', '')    
reshaped_text = arabic_reshaper.reshape(data)
bidi_text = get_display(reshaped_text)
print >>open('out', 'w'), reshaped_text.encode('utf-8')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128).

Any hand

Thanks


Solution

  • The method decode() decodes the string using the codec registered for encoding. It defaults to the default string encoding.

    When you reading utf-8 encoded file, you need to use string.decode('utf8')

    Write:

    data = 'my data'
    with open("file.txt" , "w") as f:
        f.write(data.encode('utf-8'))
    

    Read:

    with open("file.txt" , "r") as f:
        data = f.read().decode('utf-8')