Search code examples
pythonpython-3.xzipdeflate

Original ZIP size limits 2^32 vs 2^32 - 1. Is there an off by 1 error in Wikipedia?


According to https://en.wikipedia.org/wiki/ZIP_(file_format)#ZIP64

The original .ZIP format had a 4 GB (2^32 bytes) limit on various things (uncompressed size of a file, compressed size of a file, and total size of the archive), as well as a limit of 65,535 (2^16-1) entries in a ZIP archive.

Is the 2^32 value correct? By my understanding, the maximum value should be the maximum possible value held in a 32 bit unsigned integer, which is 2^32-1

I know that 2^32-1 does have particular meaning according to the ZIP spec at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT (usually mentioned as 0xFFFFFFFF), so I don't want to assume anything.

The 2^16-1 limit for number of entries does seem right to me, as the maximum value that can be stored in a 16 bit unsigned integer.

Context: I'm writing code to write ZIP files in a streaming way in Python https://github.com/uktrade/stream-zip, as well code to open ZIP files in a streaming way https://github.com/uktrade/stream-unzip, and I want both to handle the various limits correctly. Or if not "correctly" (say if there is no "correctly") as best as is reasonable.


Solution

  • They mix up a few things in that sentence, but the limits were 232–1 compressed bytes as well as 232–1 uncompressed bytes in a single entry, and a start-of-central-directory offset of 232–1. And, as stated, 216–1 entries.

    Note that the limit on the central-directory offset permits a zip file larger than 4 GiB, but not much larger. So the "total size of the archive" limit mentioned in the Wikipedia page is neither 4 GiB nor 4 GiB – 1. The sentence would need to be broken up to provide exactly correct information.