Search code examples
javadownloadzipextractarchive

How can I download a single file from a large remote zip file in Java?


I'm trying to download a small file (0.3 KB) from a given zip file that's around 3-5 GB in size.

I have currently been using the native library libfragmentzip using JNA, which is very fast, but has issues of its own that come with using native libraries (like not being cross-platform).

I have tried this solution, but it is much slower and ends up taking minutes compared to using libfragmentzip, which only seems to take seconds.

This is a URL to a test zip file (the extension is .ipsw but it is really a zip). The file I am trying to download is BuildManifest.plist, in the root of the zip.

Is there a fast way to download a single file from a remote zip file without using a native library?


Solution

  • Using Apache Commons Compress and a custom ByteChannel implementation backed by HTTP:

    var url = new URL(...);
    var fileName = "file.txt";
    var dest = Path.of(fileName);
    
    try (var zip = new ZipFile(new HttpChannel(url), "zip", "UTF8", true, true);
        var stream = zip.getInputStream(zip.getEntry(fileName))) {
        Files.copy(stream, dest, StandardCopyOption.REPLACE_EXISTING);
    }
    

    HttpChannel (modified from JCodec):

    public class HttpChannel implements SeekableByteChannel {
    
        private final URL url;
        private ReadableByteChannel ch;
        private long pos;
        private long length;
    
        public HttpChannel(URL url) {
            this.url = url;
        }
    
        @Override
        public long position() {
            return pos;
        }
    
        @Override
        public SeekableByteChannel position(long newPosition) throws IOException {
            if (newPosition == pos) {
                return this;
            } else if (ch != null) {
                ch.close();
                ch = null;
            }
            pos = newPosition;
            return this;
        }
    
        @Override
        public long size() throws IOException {
            ensureOpen();
            return length;
        }
    
        @Override
        public SeekableByteChannel truncate(long size) {
            throw new UnsupportedOperationException("Truncate on HTTP is not supported.");
        }
    
        @Override
        public int read(ByteBuffer buffer) throws IOException {
            ensureOpen();
            int read = ch.read(buffer);
            if (read != -1)
                pos += read;
            return read;
        }
    
        @Override
        public int write(ByteBuffer buffer) {
            throw new UnsupportedOperationException("Write to HTTP is not supported.");
        }
    
        @Override
        public boolean isOpen() {
            return ch != null && ch.isOpen();
        }
    
        @Override
        public void close() throws IOException {
            ch.close();
        }
    
        private void ensureOpen() throws IOException {
            if (ch == null) {
                URLConnection connection = url.openConnection();
                if (pos > 0)
                    connection.addRequestProperty("Range", "bytes=" + pos + "-");
                ch = Channels.newChannel(connection.getInputStream());
                String resp = connection.getHeaderField("Content-Range");
                if (resp != null) {
                    length = Long.parseLong(resp.split("/")[1]);
                } else {
                    resp = connection.getHeaderField("Content-Length");
                    length = Long.parseLong(resp);
                }
            }
        }
    }