Search code examples
pythonutf-16

Why does Python not read the end of line character when opening utf-16 file?


I'm concatenating two text files, one is utf-16. When read the lines from the files, and split them, the utf-16 file does not have an end of line. Everything goes into one line, so I have to specify an end of line character. Any ideas why ?

Below code that is working, but I want to know why do I need to have end of line for utf-16.

with open(file_temp, 'w') as outfile:
    with open(file_normal) as infile:
        for line in infile:
            outfile.write(line.split(",")[0]) # auto end of line
    with open(file_utf16, encoding='utf-16') as infile: # different file format
        for line in infile:
            outfile.write(line.split(",")[0] + "\n") # needs end of line char for some reason ?

I expected the end of line character to be present in the utf-16 file when reading with correct encoding.


Solution

  • The newline has nothing to do with the encoding as such

    with open("someFile_utf16.txt", "w",encoding='utf-16') as infile:
        for x in range(10):
            infile.write(str(x))
    
    with open("someFile_normal.txt", "w") as infile:
        for x in range(10):
            infile.write(str(x))
    

    Both had the same data in the files

    0123456789
    

    The only possible explanation is that the normal file had the end of line written into it, whereas the utf-16 file did not have it

    For more reference

    https://docs.python.org/3/tutorial/inputoutput.html