Search code examples
linuxunixunzip

Why does UnZip extract the last concatenated ZIP?


I found the following behavior to be unexpected:

$ mkdir tmp && cd tmp/
$ for example in a b c ; do echo $example > $example.txt ; done
$ for file in `ls *` ; do zip $file.zip $file ; done
$ cat a.txt.zip b.txt.zip c.txt.zip > concatenated.zip
$ unzip concatenated.zip -d output
$ ls output/
c.txt                                     # unexpected

On the other hand, p7zip does this:

$ rm -r output/
$ 7z x concatenated.zip -ooutput/
$ ls output/
a.txt

Why does UnZip extract the last concatenated ZIP? Does it traverse backwards from EOF until it finds the PK file signature?


Solution

  • Does it traverse backwards from EOF until it finds the PK file signature?

    Yes. Here is what unzip will do:

    • look for the "end of central directory record" (EOCD) at the end of the zip file
    • read the record and follow the "offset of start of central directory"
    • read the central directory (it contains the list of every entries in the archive)
    • read each entry and follow the "relative offset of local header"
    • read the local header with the data and extract it

    In you case, you will only find the last EOCD with a wrong offset (you prepended bytes). That's why unzip tells you:

    warning [concatenated.zip]:  324 extra bytes at beginning or within zipfile
      (attempting to process anyway)
    

    It finds the central directory of c.txt.zip, sees only one entry (c.txt), extracts only one file.

    Given the structure of zip files, I would say it's the logical thing to do. Self extracting zip files use this: the file starts with a binary to extract itself and ends with the actual zip content (see unzipsfx and zip -A).

    It looks like 7z will try from the end if the file doesn't start like a zip file:

    # not a.txt.zip, but a.txt
    $ cat a.txt b.txt.zip c.txt.zip > prepended.zip
    # fix offset
    $ zip -A prepended.zip
    
    $ unzip -l prepended.zip 
    Archive:  prepended.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
            2  2016-11-22 20:29   c.txt
    ---------                     -------
            2                     1 file
    
    
    $ 7z l prepended.zip 
    [...]
    Path = prepended.zip
    Warning: The archive is open with offset
    Type = zip
    Physical Size = 326
    Embedded Stub Size = 164
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
    2016-11-22 20:29:05 .....            2            2  c.txt
    ------------------- ----- ------------ ------------  ------------------------
    2016-11-22 20:29:05                  2            2  1 files
    

    Note zip -A to fix offsets:

    The -A option tells zip to adjust the entry offsets stored in the archive to take into account this "preamble" data.

    I don't know what your are trying to achieve, but concatenating zip files may not be the easiest way (extracting them back will not be easy).