I am using Python 3.6.4 on Windows 10 with Fall Creators Update. I am attempting to read a XML file using the following code:
with open('file.xml', 'rt', encoding='utf8') as file:
for line in file.readline():
do_something(line)
readline()
is returning a single character on each call, not a complete line. The file was produced on Linux, is definitely encoded as UTF8, has nothing special such as a BOM at the beginning and has been verified with a hex dump to contain valid data. The line end is 0x0a
since it comes from Linux. I tried specifying -1
as the argument to readline()
, which should be the default, without any change in behavior. The file is very large (>240GB) but the problem is occurring at the start of the file.
Any suggestions as to what I might be doing wrong?
readline()
will return a single line as a string (which you then iterate over). You should probably use readlines()
instead, as this will give you a list of lines which your for-loop will iterate over, one line at a time.
Even better, and more efficient:
for line in file:
do_something(line)