Search code examples
javacompressiongzipbinaryfiles

Decompress large binary files


I have a function to decompress large zip files using the below method. They are times where I run into OutOfMemoryError error because the file is just too large. Is there a way I can optimize my code? I have read something about breaking the file into smaller parts that can fit into memory and decompress but I don't know how to do that. Any help or suggestion is appreciated.

private static String decompress(String s){
        String pathOfFile = null;

        try(BufferedReader reader = new BufferedReader(new InputStreamReader(new GZIPInputStream(new FileInputStream(s)), Charset.defaultCharset()))){
            File file = new File(s);
            FileOutputStream fos = new FileOutputStream(file);

            String line;
            while((line = reader.readLine()) != null){
                fos.write(line.getBytes());
                fos.flush();
            }

            pathOfFile = file.getAbsolutePath();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return pathOfFile;
    }

The stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.ArrayList.grow(ArrayList.java:237)
        at java.base/java.util.ArrayList.ensureCapacity(ArrayList.java:217)

Solution

  • Don't use Reader classes because you don't need to write output file character by character or line by line. You should read and write byte by byte with InputStream.transferTo() method:

    try(var in = new GZIPInputStream(new FileInputStream(inFile));
        var out = new FileOutputStream(outFile)) {
        in.transferTo(out);           
    }
    

    Also you probably don't need to call flush() explicitly, doing it after every line is wasteful.