Search code examples
phpexcelmime-typesxlsxfileinfo

Detect excel .xlsx file mimetype via PHP


I can't detect mimetype for xlsx Excel file via PHP because it's zip archive.

File utilite

file file.xlsx
file.xlsx: Zip archive data, at least v2.0 to extract

PECL fileinfo

$finfo = finfo_open(FILEINFO_MIME_TYPE);
finfo_file($finfo, "file.xlsx");
application/zip

How to validate it? Unpack and view structure? But if it's arcbomb?


Solution

  • I know this works for zip files, but I'm not too sure about xlsx files. It's worth a try:

    To list the files in a zip archive:

    $zip = new ZipArchive;
    $res = $zip->open('test.zip');
    if ($res === TRUE) {
        for ($i=0; $i<$zip->numFiles; $i++) {
            print_r($zip->statIndex($i));
        }
        $zip->close();
    } else {
        echo 'failed, code:' . $res;
    }
    

    This will print all the files like this:

    Array
    (
        [name] => file.png
        [index] => 2
        [crc] => -485783131
        [size] => 1486337
        [mtime] => 1311209860
        [comp_size] => 1484832
        [comp_method] => 8
    )
    

    As you can see here, it gives the size and the comp_size for each archive. If it is an archive bomb, the ratio between these two numbers will be astronomical. You could simply put a limit of however many megabytes you want the maximum decompressed file size and if it exceeds that amount, skip that file and give an error message back to the user, else proceed with your extraction. See the manual for more information.