Search code examples
pythonpython-3.xgzipbufferedreader

Fast reading of gzip (text file) using io.BufferedReader in Python 3


I'm trying to efficiently read in, and parse, a compressed text file using the gzip module. This link suggests wrapping the gzip file object with io.BufferedReader, like so:

import gzip, io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
     for line in f.readlines():
         # do stuff
gz.close()

To do this in Python 3, I think gzip must be called with mode='rb'. So the result is that line is a binary string. However, I need line to be a text/ascii string. Is there a more efficient way to read in the file as a text string using BufferedReader, or will I have to decode line inside the for loop?


Solution

  • You can use io.TextIOWrapper to seamlessly wrap a binary stream to a text stream instead:

    f = io.TextIOWrapper(gz)
    

    Or as @ShadowRanger pointed out, you can simply open the gzip file in text mode instead, so that the gzip module will apply the io.TextIOWrapper wrapper for you:

    for line in gzip.open(in_path, 'rt'):
        # do stuff