Search code examples
pythonfilegenerator

Generator comprehension with open function


I'm trying to figure out what is the best of using generator when parsing a file line by line. Which use of the generator comprehension will be better.

First option.

with open('some_file') as file:
    lines = (line for line in file)

Second option.

lines = (line for line in open('some_file'))

I know it will produce same results, but which one will be faster/ more efficient?


Solution

  • You can't combine generators and context managers (with statements).

    Generators are lazy. They will not actually read their source data until something requests an item from them.

    This appears to work:

    with open('some_file') as file:
        lines = (line for line in file)
    

    but when you actually try to read a line later in your program

    for line in lines:
        print(line)
    

    it will fail with ValueError: I/O operation on closed file.

    This is because the context manager has already closed the file - that's its sole purpose in life - and the generator has not started reading it until the for loop started to actually request data.

    Your second suggestion

    lines = (line for line in open('some_file'))
    

    suffers from the opposite problem. You open() the file, but unless you manually close() it (and you can't because you don't know the file handle), it will stay open forever. That's the very situation that context managers fix.

    Overall, if you want to read the file, you can either ... read the file:

    with open('some_file') as file:
        lines = list(file)
    

    or you can use a real generator:

    def lazy_reader(*args, **kwargs):
        with open(*args, **kwargs) as file:
            yield from file
    

    and then you can do

    for line in lazy_reader('some_file', encoding="utf8"):
        print(line)
    

    and lazy_reader() will close the file when the last line was read.