I am trying to read a zip file in python that was written with pkzip:
import zipfile
fname = "myfile.zip"
unzipped = zipfile.ZipFile(fname, "r")
But get this error:
unzipped = zipfile.ZipFile(fname, "r")
File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1222, in __init__
self._RealGetContents()
File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1285, in _RealGetContents
endrec = _EndRecData(fp)
File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 282, in _EndRecData
return _EndRecData64(fpin, -sizeEndCentDir, endrec)
File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 228, in _EndRecData64
raise BadZipFile("zipfiles that span multiple disks are not supported")
zipfile.BadZipFile: zipfiles that span multiple disks are not supported
As far as I can tell, this file does not span multiple disks. I say this because:
Checking against the solution in this Stackoverflow answer, my version of zipfile was appropriately patched.
It unzips fine with:
$ unzip myfile.zip
on the linux command line.
So, it doesn't seem to actually be a bad zip file. Reading the first few bytes by opening it with raw file access, there is a suggestive header that PKzip may be formatting this file in an interesting way:
b'PK\x03
Examining the python library documentation for zipfile, there is an PKZIP application note:
The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.
Which links here. This is very thorough, but I don't see concrete instruction on how to add which options into the call to zipfile in order to parse the file correctly.
PKZIP is in fairly wide use, so I'm surprised to not find more common examples or native support. What options are necessary to open a pkzipped file in python that throws this multiple-disk error?
The link you posted changed zipfile
from this
if diskno != 0 or disks != 1:
raise BadZipFile("zipfiles that span multiple disks are not supported")
to this
if diskno != 0 or disks > 1:
raise BadZipFile("zipfiles that span multiple disks are not supported")
If you are still getting the error "zipfiles that span multiple disks are not supported", it means that diskno != 0
or disks > 1
.
You need to find out more about the internal structure of myfile.zip
.
Try running zipdetails and checking the very last section output. Below is what a single disk archive should look like
# zipdetails fred.zip
...
3CF31 END CENTRAL HEADER 06054B50
3CF35 Number of this disk 0000
3CF37 Central Dir Disk no 0000
3CF39 Entries in this disk 0009
3CF3B Total Entries 0009
3CF3D Size of Central Dir 00000317
3CF41 Offset to Central Dir 0003CC1A
3CF45 Comment Length 0000
Done