Search code examples
pythonpython-3.xunicodelinecache

How to use linecache with unicode?


I open my file thus:

with open(sourceFileName, 'r', encoding='ISO-8859-1') as sourceFile:

but, when I

previousLine = linecache.getline(sourceFileName, i - 1)

I get an exception

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 169: 
invalid start byte

This is because (I think) linecache.getline returns a str() (which does not have a decode() method).

My script must be able to support unicode, so I can't simply convert the input file to UTF-8.


Solution

  • linecache takes a filename, not a file object, as your usage shows. It has no provision for an encoding. Also from the documentation:

    This is used by the traceback module to retrieve source lines for inclusion in the formatted traceback.

    This implies that it is mainly used for Python source code. As it turns out, if the file has a Python source file encoding comment, it works:

    input.txt

    # coding: iso-8859-1
    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ
    [\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»
    ¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
    

    test.py

    import linecache
    print(linecache.getline('input.txt', 3))
    

    Output

    [\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»
    

    So linecache probably isn't the solution to your issue. Instead, open the file as you've shown and perhaps cache the lines yourself:

    with open('x.txt',encoding='iso-8859-1') as f:
        lines = f.readlines()
    print(lines[2])
    

    You could also append lines to a list as they are read if you don't want to read the whole file, similar to linecache.