Search code examples
pythonbzip2compression

using bz2.decompress in python,but the answers different


I have a string like this:

un: 'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw: 'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'

and this is my code:

un = re.search(r"un: '(.+)'",page).group(1)
bz2.decompress(un)

then I use the bz2.decompress method, it returns error:

IOError: invalid data stream

and I try this:

un = 'BZh91...\x084'
bz2.decompress(un)

and it returns the correct answer.

Supplement:this is my complete code.

#!/usr/bin/env python
import urllib
import re 
import bz2

def main():
    page=urllib.urlopen("http://www.pythonchallenge.com/pc/def/integrity.html").read()
    unstring = re.search(r"un: *'(.+)'",page).group(1)
    print unstring
    un = "BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"
    #the string un is copied from output of 'print unstring'
    print bz2.decompress (un)
    print bz2.decompress (unstring)
if (__name__=="__main__"):
    main()

this is the output:

==== No Subprocess ====
>>> 
BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084
huge
Traceback (most recent call last):
  File "/home/terry/pythonchallage/pythonchallenge_8.py", line 16, in <module>
    main()
  File "/home/terry/pythonchallage/pythonchallenge_8.py", line 14, in main
    print bz2.decompress (unstring)
IOError: invalid data stream
>>> 

Solution

  • You have string literals there, where each \xhh value is 4 literal characters, not a byte escape.

    If so, you'll first need to tell Python to interpret those:

    bz2.decompress(un.decode('string_escape'))
    

    Demo:

    >>> unstring = r'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
    >>> print unstring
    BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084
    >>> unstring
    'BZh91AY&SYA\\xaf\\x82\\r\\x00\\x00\\x01\\x01\\x80\\x02\\xc0\\x02\\x00 \\x00!\\x9ah3M\\x07<]\\xc9\\x14\\xe1BA\\x06\\xbe\\x084'
    >>> import bz2
    >>> bz2.decompress(unstring)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    IOError: invalid data stream
    >>> bz2.decompress(unstring.decode('string_escape'))
    'huge'