Search code examples
javamultithreadingjava-8zipnio

How can I unzip huge folder with multithreading with java - preferred java8?


Reffering to : http://www.pixeldonor.com/2013/oct/12/concurrent-zip-compression-java-nio/

I'm trying to unzip 5GB zipped file , average it takes me about 30 min and it is a lot for our app , I'm trying to reduce time.

I've tried a lot of combination , changed buffer size (by default my write chunk is 4096 bytes) , changed NIO methods , libraries , all results are pretty the same.

One thing still didn't try is to split zipped files by chunks , so read it by multithread chunks.

The snippet code is:

  private static ExecutorService e = Executors.newFixedThreadPool(20);
  public static void main(String argv[]) {
        try {
            String selectedZipFile = "/Users/xx/Documents/test123/large.zip";
            String selectedDirectory = "/Users/xx/Documents/test2";
            long st = System.currentTimeMillis();

            unzip(selectedDirectory, selectedZipFile);

            System.out.println(System.currentTimeMillis() - st);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


public static void unzip(String targetDir, String zipFilename) {
    ZipInputStream archive;
            try {
                List<ZipEntry> list = new ArrayList<>();
                archive = new ZipInputStream(new BufferedInputStream(new FileInputStream(zipFilename)));
                ZipEntry entry;
                while ((entry = archive.getNextEntry()) != null) {
                    list.add(entry);
                }

                for (List<ZipEntry> partition : Lists.partition(list, 1000)) {
                    e.submit(new Multi(targetDir, partition, archive));
                }
            } catch (Exception e){
                e.printStackTrace();
            }
}

and the runnable is :

  static class Multi implements Runnable {

    private List<ZipEntry> partition;
    private ZipInputStream zipInputStream;
    private String targetDir;

    public Multi(String targetDir, List<ZipEntry> partition, ZipInputStream zipInputStream) {
        this.partition = partition;
        this.zipInputStream = zipInputStream;
        this.targetDir = targetDir;
    }

    @Override
    public void run() {
        for (ZipEntry entry : partition) {
            File entryDestination = new File(targetDir, entry.getName());
            if (entry.isDirectory()) {
                entryDestination.mkdirs();
            } else {
                entryDestination.getParentFile().mkdirs();

                BufferedOutputStream output = null;
                try {
                    int n;
                    byte buf[] = new byte[BUFSIZE];
                    output = new BufferedOutputStream(new FileOutputStream(entryDestination), BUFSIZE);
                    while ((n = zipInputStream.read(buf, 0, BUFSIZE)) != -1) {
                        output.write(buf, 0, n);
                    }
                    output.flush();


                } catch (FileNotFoundException e1) {
                    e1.printStackTrace();
                } catch (IOException e1) {
                    e1.printStackTrace();
                } finally {

                    try {
                        output.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }

                }
            }
        }
    }
}

But for reason it stores only directories without files content...

My Question is: what is the right way to make chunks with multithread over large zip file regarding the way of the "compression" article mentioned above?


Solution

  • A ZipInputStream is a single stream of data, it cannot be split.

    If you want multi-threaded unzipping, you need to use ZipFile. With Java 8 you even get the multi-threading for free.

    public static void unzip(String targetDir, String zipFilename) {
        Path targetDirPath = Paths.get(targetDir);
        try (ZipFile zipFile = new ZipFile(zipFilename)) {
            zipFile.stream()
                   .parallel() // enable multi-threading
                   .forEach(e -> unzipEntry(zipFile, e, targetDirPath));
        } catch (IOException e) {
            throw new RuntimeException("Error opening zip file '" + zipFilename + "': " + e, e);
        }
    }
    
    private static void unzipEntry(ZipFile zipFile, ZipEntry entry, Path targetDir) {
        try {
            Path targetPath = targetDir.resolve(Paths.get(entry.getName()));
            if (Files.isDirectory(targetPath)) {
                Files.createDirectories(targetPath);
            } else {
                Files.createDirectories(targetPath.getParent());
                try (InputStream in = zipFile.getInputStream(entry)) {
                    Files.copy(in, targetPath, StandardCopyOption.REPLACE_EXISTING);
                }
            }
        } catch (IOException e) {
            throw new RuntimeException("Error processing zip entry '" + entry.getName() + "': " + e, e);
        }
    }
    

    You might also want to check out this answer, which uses FileSystem to access the zip file content, for a true Java 8 experience.