Search code examples
phpfilecompressionarchivephp-ziparchive

List files in .7z, .rar and .tar archives using PHP


I want to list the files inside an archive, without their extraction.

The types of archives I am interested in:

  • .7z (7-Zip)
  • .rar (WinRAR)
  • .tar (POSIX, e.g. GNU tar).
  • .zip (ISO standard, e.g. WinZip)

For .zip files, I have been able to achieve this:

<?php
    $za = new ZipArchive();
    $za->open('theZip.zip');
    for ($i = 0; $i < $za->numFiles; $i++) {
        $stat = $za->statIndex($i);
        print_r(basename($stat['name']) . PHP_EOL);
    }
?>

However, I have not managed to do the same for .7z files. Haven’t tested .rar and .tar, but will need them as well.


Solution

  • This is something that has come up before (for various reasons like this and this and the one with broken links in the answer).

    Generally the prevailing opinion at the moment is to create a wrapper (either DIY or use a library) which relies on having a 7-zip binary (executable) to be accessible on the server and wrap calls to the binary using exec(), rather than a pure PHP solution.

    Since the 7zip format supports a variety of compression algorithms, I'm assuming that you probably want a pure PHP implementation of reading/decompressing the LZMA format. While there are LZMA SDKs available for C, C++, C# and Java and someone has made a PHP Extension for LZMA2 (and a fork for LZMA) as yet even though there has even been interest on the 7-zip forums for quite a while, no one seems to have ported this over as a comprehensive PECL extension or pure PHP yet.

    Depending on your needs & motivation, this leaves you with:

    • add the 7-zip binary to your server, and use a wrapper library, be it your own or someone else's
    • install and use an unofficial PECL extension
    • bravely port the LZMA SDK to PHP yourself (and hopefully contribute it back to open source!)

    For other formats you can look to the PHP documentation for examples and details on usage:

    Since all of these involve PECL extensions, if you're limited by your webhost in some way and need pure PHP solutions for this, it might be easier to just shift to a more amenable webhost.

    To attempt to protect against zip bombs, you can look at the compression ratios as suggested by this answer (packed size divided by unpacked size and treat anything over a certain threshold as invalid), although the zip bomb talked about the answer to one of the linked questions would indicate that this can be ineffective against multi-layered zip bombs. For those you would need to look at whether or not the files you're listing are archives as well, ensuring you're not doing any kind of recursive extraction and then treat archives that contain archives as invalid.

    For completeness, some usage examples for official PECL extensions:

    RAR:

    <?php
    // open the archive file
    $archive = RarArchive::open('archive.rar');
    // make sure it's valid
    if ($archive === false) return;
    // retrieve a list of entries in the archive
    $entries = $archive->getEntries();
    // make sure the entry list is valid
    if ($entries === false) return;
    // example output of entry count
    echo "Found ".count($entries)." entries.\n";
    // loop over entries
    foreach ($entries as $e) {
        echo $e->getName()."\n";
    }
    // close the archive file
    $archive->close();
    ?>
    

    TAR:

    <?php
    // open the archive file
    try {
        $archive = new PharData('archive.tar');
    }
    // make sure it's valid
    catch (UnexpectedValueException $e) {
        return;
    }
    // make sure the entry list is valid
    if ($archive->count() === 0) return;
    // example output of entry count
    echo "Found ".$archive->count()." entries.\n";
    // loop over entries (PharData is already a list of entries in the archive)
    foreach ($archive as $entry) {
        echo $entry."\n";
    }
    // no need to close a PharData
    ?>
    

    ZIP (adapted from OP's question which is from here):

    <?php
    // open the archive file
    $archive = new ZipArchive;
    $valid = $archive->open('archive.zip');
    // make sure it's valid (if not ZipArchive::open() returns various error codes)
    if ($valid !== true) return;
    // make sure the entry list is valid
    if ($archive->numFiles === 0) return;
    // example output of entry count
    echo "Found ".$archive->numFiles." entries.\n";
    // loop over entries
    for ($i = 0; $i < $archive->numFiles; $i++) {
        $e = $archive->statIndex($i);
        echo $e['name']."\n";
    }
    // close the archive file (redundant as called automatically at the end of the script)
    $archive->close();
    ?>
    

    GZ:

    Since gz (gnu Zlib) is a compression mechanism rather than an archive format, this is different in PHP. If you open a .gz file by itself (rather than treating it like a .tar) with gzopen(), any reads from it are transparently decompressed. Since this is most commonly .tar.gz, you can treat it like a .tar as above (also see this answer on another question). Or you can extract the tar with PharData::decompress() as in this answer on another question.