Search code examples
pythonpython-2.7zipfilesplittingpython-zipfile

How to check if zip file is split across multiple archives using python's zipfile lib?


According to the zip file standard: http://www.pkware.com/documents/casestudies/APPNOTE.TXT it also supports splitting a zip file across multiple files:

      Spanned/Split archives created using PKZIP for Windows
      (V2.50 or greater), PKZIP Command Line (V2.50 or greater),
      or PKZIP Explorer will include a special spanning 
      signature as the first 4 bytes of the first segment of
      the archive.  This signature (0x08074b50) will be 
      followed immediately by the local header signature for
      the first file in the archive.  

      A special spanning marker may also appear in spanned/split 
      archives if the spanning or splitting process starts but 
      only requires one segment.  In this case the 0x08074b50 
      signature will be replaced with the temporary spanning 
      marker signature of 0x30304b50.  Split archives can
      only be uncompressed by other versions of PKZIP that
      know how to create a split archive.

      The signature value 0x08074b50 is also used by some
      ZIP implementations as a marker for the Data Descriptor 
      record.  Conflict in this alternate assignment can be
      avoided by ensuring the position of the signature
      within the ZIP file to determine the use for which it
      is intended.  

Any idea how to check that signature or other way to check if a zip is split across multiple files?


Solution

  • The particular signature that they are talking about in the standard, i.e. PK\007\008 is not handled by zipfile at all, as can be seen by grepping over the library source (I got the same result with Python 3.2):

    # grep PK /usr/lib/python2.7/zipfile.py 
    
    stringEndArchive = "PK\005\006"
    stringCentralDir = "PK\001\002"
    stringFileHeader = "PK\003\004"
    stringEndArchive64Locator = "PK\x06\x07"
    stringEndArchive64 = "PK\x06\x06"
    

    So I doubt that you can use the library for that purpose. Might as well try to find that signature yourself by extending the library.