I have a series of strings in a list named 'lines' and I compress them as follows:
import bz2
compressor = bz2.BZ2Compressor(compressionLevel)
for l in lines:
compressor.compress(l)
compressedData = compressor.flush()
decompressedData = bz2.decompress(compressedData)
When compressionLevel is set to 8 or 9, this works fine. When it's any number between 1 and 7 (inclusive), the final line fails with an IOError: invalid data stream. The same occurs if I use the sequential decompressor. However, if I join the strings into one long string and use the one-shot compressor function, it works fine:
import bz2
compressedData = bz2.compress("\n".join(lines))
decompressedData = bz2.decompress(compressedData)
# Works perfectly
Do you know why this would be and how to make it work at lower compression levels?
You are throwing away the compressed data returned by compressor.compress(l)
... docs say "Returns a chunk of compressed data if possible, or an empty byte string otherwise." You need to do something like this:
# setup code goes here
for l in lines:
chunk = compressor.compress(l)
if chunk: do_something_with(chunk)
chunk = compressor.flush()
if chunk: do_something_with(chunk)
# teardown code goes here
Also note that your oneshot code uses "\n".join()
... to check this against the chunked result, use "".join()
Also beware of bytes/str issues e.g. the above should be b"whatever".join()
.
What version of Python are you using?