Search code examples
pythonbinaryzipbuffer

Unzip file from hexadecimal string in Python


I have a hexadecimal string beginning with '504B030414...' which should be a zipped XML file.

Using Python, how can I unzip/read this file? Using below, I get "zipfile.BadZipFile: File is not a zip file" (string purposely truncated):

import zipfile
import io
original_zip_data = "504B030414..."
filebytes = io.StringIO(original_zip_data)
myzipfile = zipfile.ZipFile(filebytes)

Below linked answer was helpful using type bytes: b'PK\x03\x04...' Unzip buffer with Python?

Is there a way to convert my string to a similar format, or is my string truly a corrupted/"BadZipFile"?


Solution

  • The problem is you need to convert the hexadecimal string to bytes. io.StringIO class is to be used with text data not binary data.

    Here is the correct way to go about it:

    import zipfile
    import io
    
    #Hex string/ Zip file
    original_zip_data = "504B030414..."
    
    
    # Convert to bytes
    filebytes = io.BytesIO(bytes.fromhex(original_zip_data))
    
    # Read the file and print
    with zipfile.ZipFile(filebytes, 'r') as myzipfile:
        file_content = myzipfile.read('fileinside.txt') 
        print(file_content.decode('utf-8'))
    

    In this example we convert the hex string to bytes, create a file-like object, and then read the file inside the zip file.