Search code examples
pythonpython-3.xpythonista

Why is it that I am able to print out a good amount of lines until I reach a certain point. Once that point is reached I get an error


So what I am basically trying to do is to read and print each individual line of an RTF file. However, my problem is that with this code that I currently have it seems to do the job up until it reaches line 937. At that point it stops reading lines and gives me this error:

Traceback (most recent call last):
  File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/openFolders.py", line 8, in <module>
    for element in file:
  File "/var/containers/Bundle/Application/8F2965B6-AC1F-46FA-8104-6BB24F1ECB97/Pythonista3.app/Frameworks/Py3Kit.framework/pylib/encodings/ascii.py", line 27, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4250: ordinal not in range(128)

file = open("Steno Dictionary.rtf", "r")

#line_number is just to know what line number has been printed on the console.  
line_number = 1

for element in file:
    
    #print(line_number) prints until it reaches 937 and then the error occurs. 
    print(line_number)
    print(element)
    line_number +=1 

How would I modify my current code to make it keep on reading lines until the end of the file? As there are still many more lines left. I have searched high and low and cannot seem to figure it out! Thank you very much to whoever can help me out! As a note: I’m using Pythonista on iOS.


Solution

  • The error you are getting means that Python doesn't understand how to translate a specific character in the document using the default text encoding.

    There are a few things you can try, the first is to check if explicitly setting the encoding to utf8 works.

    file = open("Steno Dictionary.rtf", "r", encoding="utf-8")
    ...
    

    if that doesn't work you can try to use other encodings or you can tell python to replace the bits it doesn't recognize with something else. like this

    file = open("Steno Dictionary.rtf", "r", encoding="utf-8", errors="replace")
    ...
    

    That will decode everything it knows how to, and replace what it doesn't with ? characters.