Search code examples
phparchivefopenwinrar

Force opening and reading zip files from php


This may be a simple question or a pretty complex one, ill let you be the deciders.

Using PHP To open a zip file, extract the files to a directory and close the zip file is not a complicated class to make.

But lets say that the file is not a zip, but yet is able to be read by WinRar, examples of these files are like exe's SFX archives etc.

What factors do all these files have in conmen to allow WinRar to browse the source of them.

Another example is Anti Virus Software, that individually scan files within an EXE ?

So what an example:

$handle = fopen("an_unknown_file.abc", "rb");
while (!feof($handle))
{
    //What generic code could I use to determain weather the file can be extracted ?
}
fclose($handle);

Regards.


Solution

  • Zip's specifications allow the actual "zip" file portion to be embedded ANYWHERE within a file. It doesn't necessarily have to start at position '0' in the file. This is how self-extracting zips work. It's a small .exe stub program which has a larger .zip file appended to the end of it.

    Finding a zip is mostly a matter of scanning for a zip file's "magic number" within a file, then doing a few heuristics to determine if it's really a zip file, or just something random that happens to contain a zip's magic number.

    A .docx file is really just a .zip that contains various XML files representing a Word file's contents. Just like a .jar is a zip file that contains various different chunks of Java code.

    Winrar's got a bunch of extra code within it to scan through a file and look for any identifiable "this is a compress archive" type signatures, one of which happens to be that of a zip file's.

    There's nothing too magical about it. It's just a matter of scanning through a file and looking for signatures.