I'm trying to figure out what is the best of using generator when parsing a file line by line. Which use of the generator comprehension will be better.
First option.
with open('some_file') as file:
lines = (line for line in file)
Second option.
lines = (line for line in open('some_file'))
I know it will produce same results, but which one will be faster/ more efficient?
You can't combine generators and context managers (with
statements).
Generators are lazy. They will not actually read their source data until something requests an item from them.
This appears to work:
with open('some_file') as file:
lines = (line for line in file)
but when you actually try to read a line later in your program
for line in lines:
print(line)
it will fail with ValueError: I/O operation on closed file.
This is because the context manager has already closed the file - that's its sole purpose in life - and the generator has not started reading it until the for
loop started to actually request data.
Your second suggestion
lines = (line for line in open('some_file'))
suffers from the opposite problem. You open()
the file, but unless you manually close()
it (and you can't because you don't know the file handle), it will stay open forever. That's the very situation that context managers fix.
Overall, if you want to read the file, you can either ... read the file:
with open('some_file') as file:
lines = list(file)
or you can use a real generator:
def lazy_reader(*args, **kwargs):
with open(*args, **kwargs) as file:
yield from file
and then you can do
for line in lazy_reader('some_file', encoding="utf8"):
print(line)
and lazy_reader()
will close the file when the last line was read.