Search code examples
pythonzipunzippkzip

How to read a zip file written with pkzip in python?


I am trying to read a zip file in python that was written with pkzip:

import zipfile
fname = "myfile.zip"
unzipped = zipfile.ZipFile(fname, "r")

But get this error:

    unzipped = zipfile.ZipFile(fname, "r")
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1222, in __init__
    self._RealGetContents()
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1285, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 282, in _EndRecData
    return _EndRecData64(fpin, -sizeEndCentDir, endrec)
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 228, in _EndRecData64
    raise BadZipFile("zipfiles that span multiple disks are not supported")
zipfile.BadZipFile: zipfiles that span multiple disks are not supported

As far as I can tell, this file does not span multiple disks. I say this because:

  1. Checking against the solution in this Stackoverflow answer, my version of zipfile was appropriately patched.

  2. It unzips fine with:

    $ unzip myfile.zip
    

    on the linux command line.

So, it doesn't seem to actually be a bad zip file. Reading the first few bytes by opening it with raw file access, there is a suggestive header that PKzip may be formatting this file in an interesting way:

  b'PK\x03

Examining the python library documentation for zipfile, there is an PKZIP application note:

The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

Which links here. This is very thorough, but I don't see concrete instruction on how to add which options into the call to zipfile in order to parse the file correctly.

PKZIP is in fairly wide use, so I'm surprised to not find more common examples or native support. What options are necessary to open a pkzipped file in python that throws this multiple-disk error?


Solution

  • The link you posted changed zipfile from this

    if diskno != 0 or disks != 1:
        raise BadZipFile("zipfiles that span multiple disks are not supported")
    

    to this

    if diskno != 0 or disks > 1:
        raise BadZipFile("zipfiles that span multiple disks are not supported")
    

    If you are still getting the error "zipfiles that span multiple disks are not supported", it means that diskno != 0 or disks > 1.

    You need to find out more about the internal structure of myfile.zip.

    Try running zipdetails and checking the very last section output. Below is what a single disk archive should look like

    # zipdetails  fred.zip 
    ...
    3CF31 END CENTRAL HEADER    06054B50
    3CF35 Number of this disk   0000
    3CF37 Central Dir Disk no   0000
    3CF39 Entries in this disk  0009
    3CF3B Total Entries         0009
    3CF3D Size of Central Dir   00000317
    3CF41 Offset to Central Dir 0003CC1A
    3CF45 Comment Length        0000
    Done