Search code examples
python-3.xreadline

readline() returns a character at a time


I am using Python 3.6.4 on Windows 10 with Fall Creators Update. I am attempting to read a XML file using the following code:

with open('file.xml', 'rt', encoding='utf8') as file:
    for line in file.readline():
        do_something(line)

readline() is returning a single character on each call, not a complete line. The file was produced on Linux, is definitely encoded as UTF8, has nothing special such as a BOM at the beginning and has been verified with a hex dump to contain valid data. The line end is 0x0a since it comes from Linux. I tried specifying -1 as the argument to readline(), which should be the default, without any change in behavior. The file is very large (>240GB) but the problem is occurring at the start of the file.

Any suggestions as to what I might be doing wrong?


Solution

  • readline() will return a single line as a string (which you then iterate over). You should probably use readlines() instead, as this will give you a list of lines which your for-loop will iterate over, one line at a time.

    Even better, and more efficient:

        for line in file:
            do_something(line)