Search code examples
pythonutf-8character-encodingutf-16

Python - Python 3.1 can't seem to handle UTF-16 encoded files?


I'm trying to run some code to simply go through a bunch of files and write those that happen to be .txt files into the same file, removing all the spaces. Here's some simple code that should do the trick:

for subdir, dirs, files in os.walk(rootdir):
for file in files:
    if '.txt' in file:
        f = open(subdir+'/'+file, 'r')
        line = f.readline()
        while line:
            line2 = line.split()
            if line2:
                output_file.write(" ".join(line2)+'\n')
            line = f.readline()
        f.close()

But instead, I get the following error:

File "/usr/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: unexpected code byte

It turns out these .txt files are all in UTF-16 (according to FireFox, at any rate). I thought Python 3.x was supposed to be able to handle any sort of character encoding??

Best, Georgina


Solution

  • Use open(bla, 'r', encoding="utf-16").