Search code examples
pythonhttpgzipioerrorcontent-encoding

Gunzipping Contents of a URL - Python


I'm back. :) Again trying to get the gzipped contents of a URL and gunzip them. This time in Python. The #SERVER section of code is the script I'm using to generate the gzipped data. The data is known good, as it works with Java. The #CLIENT section of code is the bit of code I'm using client-side to try and read that data (for eventual JSON parsing). However, somewhere in this transfer, the gzip module forgets how to read the data it created.

#SERVER
outbuf = StringIO.StringIO()
outfile = gzip.GzipFile(fileobj = outbuf, mode = 'wb')
outfile.write(data)
outfile.close()
print "Content-Encoding: gzip\n"
print outbuf.getvalue()

#CLIENT
urlReq = urllib2.Request(url)
urlReq.add_header('Accept-Encoding', '*')
urlConn = urllib2.build_opener().open(urlReq)
urlConnObj = StringIO.StringIO(urlConn.read())
gzin = gzip.GzipFile(fileobj = urlConnObj)
return gzin.read() #IOError: Not a gzipped file.

Other Notes:

outbuf.getvalue() is the same as urlConnObj.getvalue() is the same as urlConn.read()


Solution

  • This StackOverflow question seemed to help me out.

    Apparently, it was just wise to by-pass the gzip module entirely, opting for zlib instead. Also, changing "*" to "gzip" in the "Accept-Encoding" header may've helped.

    #CLIENT
    urlReq = urllib2.Request(url)
    urlReq.add_header('Accept-Encoding', 'gzip')
    urlConn = urllib2.urlopen(urlReq)
    return zlib.decompress(urlConn.read(), 16+zlib.MAX_WBITS)