Search code examples
pythonpython-3.xfilefile-read

Better to read the whole file, close it, and then loop over it, or loop over while it's open?


I was wondering, which of these is the better and safer way to process a file's contents line by line. The assumption here is that the file's contents are very critical, but the file is not very large, so memory consumption is not an issue.

Is it better to close the file as soon as possible using this:

with open('somefile.txt') as f:
    lines = f.readlines()

for line in lines:
    do_something(line)

Or to just loop over it in one go:

with open('somefile.txt') as f:
    for line in f:
        do_something(line)

Which of these practices is generally the more accepted way of doing it?


Solution

  • There is no "better" solution. Simply because these two are far from being equivalent.

    The first one loads entire file into memory and then processes the in-memory data. This has a potential advantage of being faster depending on what the processing is. Note that if the file is bigger than the amount of RAM you have then this is not an option at all.

    The second one loads only a piece of the file into memory, processes it and then loads another piece and so on. This is generally slower (although it is likely you won't see the difference because often the processing time, especially in Python, dominates the reading time) but drastically reduces memory consumption (assuming that your file has more than 1 line). Also in some cases it may be more difficult to work with. For example say that you are looking for a specific pattern xy\nz in the file. Now with "line by line" loading you have to remember previous line in order to do a correct check. Which is more difficult to implement (but only a bit). So again: it depends on what you are doing.

    As you can see there are tradeoffs and what is better depends on your context. I often do this: if file is relatively small (say up to few hundred megabytes) then load it into memory.

    Now you've mentioned that the content is "critical". I don't know what that means but for example if you are trying to make updates to the file atomic or reads consistent between processes then this is a very different problem from the one you've posted. And generally hard so I advice using a proper database. SQLite is an easy option (again: depending on your scenario) similar to having a file.