Our product has an export function, which uses ZipOutputStream
to zip a directory; however, when you try to zip a directory that contains file names with Chinese or Japanese character the export doesn't work properly. For some reason the new files in the zipped file are named differently. Here is an example of our zipping code:
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFileName));
out.setEncoding("UTF-8");
//program to add directory to zip
//program add/create file to zip
out.close();
My import algorithm, also built in Java, can import the zipped file correctly, even if it contains Chinese/Japanese characters in file/directory names.
Zipfile zipfile = new ZipFile(zipPath, "UTF-8");
Enumeration e = zipFile.getEntries();
while (e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
String name = entry.getName();
....
Is the zip software's program having trouble unzipping the UTF-8 encoded files, or is there something special needed to create a zip file that can be easily used by existing software using utf-8 encoding??
I have written an example program:
package ZipFile;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.tools.zip.ZipEntry;
import org.apache.tools.zip.ZipOutputStream;
public class ZipFolder{
public static void main(String[] a) throws Exception
{
String srcFolder = "D:/9.4_work/openscript_repo/中文124.All/中文";
String destZipFile = "D:/Eclipse_Projects/OpenScriptDebuggingProject/src/ZipFile/demo.zip";
zipFolder(srcFolder, destZipFile);
}
static public void zipFolder(String srcFolder, String destZipFile) throws Exception
{
ZipOutputStream zip = null;
FileOutputStream fileWriter = null;
fileWriter = new FileOutputStream(destZipFile);
zip = new ZipOutputStream(fileWriter);
zip.setEncoding("UTF-8");
// using GBK encoding, the chinese name can be correctly displayed when unzip
// zip.setEncoding("GBK");
addFolderToZip("", srcFolder, zip);
zip.flush();
zip.close();
}
static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFile);
if (folder.isDirectory()) {
addFolderToZip(path, srcFile, zip);
}
else {
byte[] buf = new byte[1024];
int len;
FileInputStream in = new FileInputStream(srcFile);
zip.putNextEntry(new ZipEntry(path + "/" + folder.getName()));
while ((len = in.read(buf)) > 0) {
zip.write(buf, 0, len);
}
}
}
static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception
{
File folder = new File(srcFolder);
for (String fileName : folder.list()) {
if (path.equals("")) {
addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip);
}
else {
addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip);
}
}
}
}
The top answer here may answer your question; unfortunately it seems to suggest that the Zip format doesn't really allow for creating a Zip file that will display filenames properly on any computer:
https://superuser.com/questions/60379/linux-zip-tgz-filenames-encoding-problem
I expect it works when you set encoding to GBK, because that is your system's default encoding and so 7zip is using that for all zip files it opens.
It suggests that rar
and 7z
formats have better support.
I found a blog entry specifically about UTF-8 in zips with Java. It suggests there's a newer version of the ZIP specification which the current versions of Java may not be creating, but Java 7 will do. I don't know if the Apache classes use this too.
http://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in