Search code examples
bashzipunzip

List zip file entries with special characters like emojis


I'm writing a script that needs to list file entries from a zip file. My problem is that when there is an entry with an emoji, and the CLI doesn't output the file name correctly:

❯ zip -r foo.zip test/
  adding: test/ (stored 0%)
  adding: test/😊.txt (stored 0%)

src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0 
❯ unzip -l foo.zip 
Archive:  foo.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  04-08-2022 20:54   test/
        0  04-08-2022 20:54   test/�???.txt  <---- here is my problem
---------                     -------
        0                     2 files

src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0 
❯ unzip foo.zip test/😊.txt
Archive:  foo.zip
 extracting: test/�???.txt

Is there a way to tell unzip to list the file entries with consideration of special characters?

Thanks!


Solution

  • It doesn't seem possible to accurately list the files in a zip archive with unzip (tested with unzip 6.00); you'll have to select an other tool.

    I chose perl in my answer because it has the required functionality in its core library. Here I used a newline as delimiter (-l) but you should replace it with a NULL-BYTE (-l0) if you want to be able to read and process the outputted paths 100% accurately from bash:

    perl -l -e '
        use IO::Uncompress::Unzip;
        $zip = IO::Uncompress::Unzip->new($ARGV[0]);
        while($zip->nextStream()) {
            print $zip->getHeaderInfo()->{Name}
        }
    ' foo.zip
    
    test/
    test/😊.txt
    

    remark: Python also have a ZipFile module in its core library. I didn't post any Python solution because of the encoding issues of its stdout. The fixes aren't compatible between Python versions...