Search code examples
pythonbzip2

unpack bz2 url without temporary file in python


I want to unpack data from bz2 url directly to target file. Here is the code:

filename = 'temp.file'  
req = urllib2.urlopen('http://example.com/file.bz2')
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
  while True:
    chunk = req.read(CHUNK)
    if not chunk: break
    fp.write(bz2.decompress(chunk)) 
fp.close()

Error on bz2.decompress(chunk) - ValueError: couldn't find end of stream


Solution

  • Use bz2.BZ2Decompressor to do sequential decompression:

    filename = 'temp.file'  
    req = urllib2.urlopen('http://example.com/file.bz2')
    CHUNK = 16 * 1024
    
    decompressor = bz2.BZ2Decompressor()
    with open(filename, 'wb') as fp:
        while True:
            chunk = req.read(CHUNK)
            if not chunk:
                break
            fp.write(decompressor.decompress(chunk))
    req.close()
    

    BTW, you don't need to call fp.close() as long as you use with statement.