I'm writing a script that needs to list file entries from a zip file. My problem is that when there is an entry with an emoji, and the CLI doesn't output the file name correctly:
❯ zip -r foo.zip test/
adding: test/ (stored 0%)
adding: test/😊.txt (stored 0%)
src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0
❯ unzip -l foo.zip
Archive: foo.zip
Length Date Time Name
--------- ---------- ----- ----
0 04-08-2022 20:54 test/
0 04-08-2022 20:54 test/�???.txt <---- here is my problem
--------- -------
0 2 files
src on main [!?] is 📦 v1.0.0 via 🤖 v16.14.0
❯ unzip foo.zip test/😊.txt
Archive: foo.zip
extracting: test/�???.txt
Is there a way to tell unzip
to list the file entries with consideration of special characters?
Thanks!
It doesn't seem possible to accurately list the files in a zip archive with unzip
(tested with unzip 6.00
); you'll have to select an other tool.
I chose perl
in my answer because it has the required functionality in its core library. Here I used a newline
as delimiter (-l
) but you should replace it with a NULL-BYTE
(-l0
) if you want to be able to read and process the outputted paths 100% accurately from bash:
perl -l -e '
use IO::Uncompress::Unzip;
$zip = IO::Uncompress::Unzip->new($ARGV[0]);
while($zip->nextStream()) {
print $zip->getHeaderInfo()->{Name}
}
' foo.zip
test/
test/😊.txt
remark: Python also have a ZipFile
module in its core library. I didn't post any Python solution because of the encoding issues of its stdout
. The fixes aren't compatible between Python versions...