I'm trying to efficiently read in, and parse, a compressed text file using the gzip module. This link suggests wrapping the gzip file object with io.BufferedReader
, like so:
import gzip, io
gz = gzip.open(in_path, 'rb')
f = io.BufferedReader(gz)
for line in f.readlines():
# do stuff
gz.close()
To do this in Python 3, I think gzip
must be called with mode='rb'
. So the result is that line
is a binary string. However, I need line
to be a text/ascii string. Is there a more efficient way to read in the file as a text string using BufferedReader
, or will I have to decode line
inside the for loop?
You can use io.TextIOWrapper
to seamlessly wrap a binary stream to a text stream instead:
f = io.TextIOWrapper(gz)
Or as @ShadowRanger pointed out, you can simply open the gzip file in text mode instead, so that the gzip
module will apply the io.TextIOWrapper
wrapper for you:
for line in gzip.open(in_path, 'rt'):
# do stuff