Search code examples
javaspringdownloadstreamzip

Java zip files from streams instantly without using byte[]


I want to compress multiples files into a zip files, I'm dealing with big files, and then download them into the client, for the moment I'm using this:

@RequestMapping(value = "/download", method = RequestMethod.GET, produces = "application/zip")
public ResponseEntity <StreamingResponseBody> getFile() throws Exception {
    File zippedFile = new File("test.zip");
    FileOutputStream fos = new FileOutputStream(zippedFile);
    ZipOutputStream zos = new ZipOutputStream(fos);
    InputStream[] streams = getStreamsFromAzure();
    for (InputStream stream: streams) {
        addToZipFile(zos, stream);
    }
    final InputStream fecFile = new FileInputStream(zippedFile);
    Long fileLength = zippedFile.length();
    StreamingResponseBody stream = outputStream - >
        readAndWrite(fecFile, outputStream);

    return ResponseEntity.ok()
        .header(HttpHeaders.ACCESS_CONTROL_EXPOSE_HEADERS, HttpHeaders.CONTENT_DISPOSITION)
        .header(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename=" + "download.zip")
        .contentLength(fileLength)
        .contentType(MediaType.parseMediaType("application/zip"))
        .body(stream);
}

private void addToZipFile(ZipOutputStream zos, InputStream fis) throws IOException {
    ZipEntry zipEntry = new ZipEntry(generateFileName());
    zos.putNextEntry(zipEntry);
    byte[] bytes = new byte[1024];
    int length;
    while ((length = fis.read(bytes)) >= 0) {
        zos.write(bytes, 0, length);
    }
    zos.closeEntry();
    fis.close();
}

This take a lot of time before all files are zipped and then the downloading start, and for large files this kan take a lot of time, this is the line responsible for the delay:

while ((length = fis.read(bytes)) >= 0) {
    zos.write(bytes, 0, length);
}

So is there a way to download files immediately while their being zipped ?


Solution

  • Try this instead. Rather than using the ZipOutputStream to wrap a FileOutputStream, writing your zip to a file, then copying it to the client output stream, instead just use the ZipOutputStream to wrap the client output stream so that when you add zip entries and data it goes directly to the client. If you want to also store it to a file on the server then you can make your ZipOutputStream write to a split output stream, to write both locations at once.

    @RequestMapping(value = "/download", method = RequestMethod.GET, produces = "application/zip")
    public ResponseEntity<StreamingResponseBody> getFile() throws Exception {
    
        InputStream[] streamsToZip = getStreamsFromAzure();
    
        // You could cache already created zip files, maybe something like this:
        //   String[] pathsOfResourcesToZip = getPathsFromAzure();
        //   String zipId = getZipId(pathsOfResourcesToZip);
        //   if(isZipExist(zipId))
        //     // return that zip file
        //   else do the following
    
        StreamingResponseBody streamResponse = clientOut -> {
            FileOutputStream zipFileOut = new FileOutputStream("test.zip");
    
            ZipOutputStream zos = new ZipOutputStream(new SplitOutputStream(clientOut, zipFileOut));
            for (InputStream in : streamsToZip) {
                addToZipFile(zos, in);
            }
        };
    
        return ResponseEntity.ok()
                .header(HttpHeaders.ACCESS_CONTROL_EXPOSE_HEADERS, HttpHeaders.CONTENT_DISPOSITION)
                .header(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename=" + "download.zip")
                .contentType(MediaType.parseMediaType("application/zip")).body(streamResponse);
    }
    
    
    private void addToZipFile(ZipOutputStream zos, InputStream fis) throws IOException {
        ZipEntry zipEntry = new ZipEntry(generateFileName());
        zos.putNextEntry(zipEntry);
        byte[] bytes = new byte[1024];
        int length;
        while ((length = fis.read(bytes)) >= 0) {
            zos.write(bytes, 0, length);
        }
        zos.closeEntry();
        fis.close();
    }
    

    public static class SplitOutputStream extends OutputStream {
        private final OutputStream out1;
        private final OutputStream out2;
    
        public SplitOutputStream(OutputStream out1, OutputStream out2) {
            this.out1 = out1;
            this.out2 = out2;
        }
    
        @Override public void write(int b) throws IOException {
            out1.write(b);
            out2.write(b);
        }
    
        @Override public void write(byte b[]) throws IOException {
            out1.write(b);
            out2.write(b);
        }
    
        @Override public void write(byte b[], int off, int len) throws IOException {
            out1.write(b, off, len);
            out2.write(b, off, len);
        }
    
        @Override public void flush() throws IOException {
            out1.flush();
            out2.flush();
        }
    
        /** Closes all the streams. If there was an IOException this throws the first one. */
        @Override public void close() throws IOException {
            IOException ioException = null;
            for (OutputStream o : new OutputStream[] {
                    out1,
                    out2 }) {
                try {
                    o.close();
                } catch (IOException e) {
                    if (ioException == null) {
                        ioException = e;
                    }
                }
            }
            if (ioException != null) {
                throw ioException;
            }
        }
    }
    

    For the first request for a set of resources to be zipped you wont know the size that the resulting zip file will be so you can't send the length along with the response since you are streaming the file as it is zipped.

    But if you expect there to be repeated requests for the same set of resources to be zipped, then you can cache your zip files and simply return them on any subsequent requests; You will also know the length of the cached zip file so you can send that in the response as well.

    If you want to do this then you will have to be able to consistently create the same identifier for each combination of the resources to be zipped, so that you can check if those resources were already zipped and return the cached file if they were. You might be able to could sort the ids (maybe full paths) of the resources that will be zipped and concatenate them to create an id for the zip file.