I'm trying to write a collection of yara signatures that will tag zip files based on artifacts of their creation.
I understand the EOCD has a magic number of 0x06054b50, and that it is located at the end of the archive structure. It has a variable length comment field, with a max length of 0xFFFF, so the EOCD could be up to 0xFFFF+ ~20 bytes. However, there could be data after the zip structure that could throw off the any offset dependent scanning.
Is there any way to locate the record without scanning the whole file for the magic bytes? How do you validate that the magic bytes aren't there by coincidence if there can be data after the EOCD?
This is typically done by scanning backwards from the end of the file until you find the EOCD signature. Yes, it is possible to find the same signature embedded in the comment, so you need to check other parts of the EOCD record to see if they are consistent with the file you are reading.
For example, if the EOCD record isn't at the end of the file, the comment length
field in the EOCD cannot be zero. It should match the number of bytes left in the file.
Similarly, if this is a single disk archive, the offset of start of central directory
needs to point to somewhere within the size of the zip archive. If you want to follow that offset you should find the signature for a central directory record.
And so on.
Note that I've ignored the complications of the Zip64
records and encryption records, but the principle is the same. You need to check the fields in the records are consistent with the file being read.