Search code examples
c++file-iocompressionunzip

What is a good way to test a file to see if its a zip file?


I am looking as a new file format specification and the specification says the file can be either xml based or a zip file containing an xml file and other files.

The file extension is the same in both cases. What ways could I test the file to decide if it needs decompressing or just reading?


Solution

  • The zip file format is defined by PKWARE. You can find their file specification here.

    Near the top you will find the header specification:

    A. Local file header:

        local file header signature     4 bytes  (0x04034b50)
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes
    
        file name (variable size)
        extra field (variable size)
    

    From this you can see that the first 4 bytes of the header should be the file signature which should be the hex value 0x04034b50. Byte order in the file is the other way round - PKWARE specify that "All values are stored in little-endian byte order unless otherwise specified.", so if you use a hex editor to view the file you will see 50 4b 03 04 as the first 4 bytes.

    You can use this to check if your file is a zip file. If you open the file in notepad, you will notice that the first two bytes (50 and 4b) are the ASCII characters PK.