Search code examples
pythongzip

Counting the number of lines in a gzip file using python


I'm trying to count the number of lines in a gz archive. There is only 1 json format text file per gz. But when I open the archive and count the lines the count is way off what I'd expect. The file contains 522 lines, but my code is returning 668480 lines.

import gzip
f = gzip.open(myfile, 'rb')
file_content = f.read()
for i, l in enumerate(file_content):
    pass
i += 1
print("File {1} contain {0} lines".format(i, myfile))

Solution

  • You are iterating over all characters not the lines. You can iterate lines the following way

    import gzip
    with gzip.open(myfile, 'rb') as f:
        for i, l in enumerate(f):
            pass
    print("File {1} contain {0} lines".format(i + 1, myfile))