Search code examples
javahttpgziphttp-compressioncontent-encoding

Does a GunzipOutputStream - or something like it - exist?


Related to Handling HTTP ContentEncoding "deflate", I'd like to know how to use an OutputStream to inflate both gzip and deflate streams. Here's why:

I have a class that fetches resources from a web server (think wget, but in Java). I have it strictly-enforcing the Content-Length of the response and I'd like to keep that enforcement. So, what I'd like to do is read a specific number of bytes from the response (which I'm already doing) but have it generate more bytes if the response has been compressed.

I have this working for deflate responses like this:

OutputStream out = System.out;
out = new InflateOutputStream(out);
// repeatedly:
out.write(compressedBytesFromResponse);

I'd like to be able to do the same thing with gzip responses, but without a GunzipOutputStream, I'm not sure what to do, next.

Update

I was considering building something like this, but it seemed completely insane. Perhaps that is the only way to use an OutputStream to inflate my data.


Solution

  • Answering my own question:

    There are two possibilities, here: gunzip on output (e.g. use GunzipOutputStream, not provided by the Java API), or gunzip on input (e.g. use GZIPInputStream, provided by the Java API) plus enforce the Content-Length during the reads.

    I have done both, and I think I prefer the latter because a) it does not require a separate thread to be launched to pump bytes from PipedOutputStream to a PipedIOnputStream and b) (a corollary, I guess) it does not have such a threat of race-conditions and other synchronization issues.

    First, here is my implementation of LimitedInputStream, which allows me to wrap the input stream and enforce a limit on the amount of data read. Note that I also have a BigLimitedInputStream that uses a BigInteger count to support Content-Length values greater than Long.MAX_LONG:

    public class LimitedInputStream
        extends InputStream
    {
        private long _limit;
        private long _read;
        private InputStream _in;
    
        public LimitedInputStream(InputStream in, long limit)
        {
            _limit= limit;
            _in = in;
            _read = 0;
        }
        @Override
        public int available()
            throws IOException
        {
            return _in.available(); // sure?
        }
    
        @Override
        public void close()
            throws IOException
        {
            _in.close();
        }
    
        @Override
        public boolean markSupported()
        {
            return false;
        }
    
        @Override
        public int read()
            throws IOException
        {
            int read = _in.read();
    
            if(-1 == read)
                return -1;
    
            ++_read;
    
            if(_read > _limit)
                return -1;
                // throw new IOException("Read limit reached: " + _limit);
    
            return read;
        }
    
        @Override
        public int read(byte[] b)
            throws IOException
        {
            return read(b, 0, b.length);
        }
    
        @Override
        public int read(byte[] b, int off, int len)
            throws IOException
        {
            // 'len' is an int, so 'max' is an int; narrowing cast is safe
            int max = (int)Math.min((long)(_limit - _read), (long)len);
    
            if(0 == max && len > 0)
                return -1;
                //throw new IOException("Read limit reached: " + _limit);
    
            int read = _in.read(b, off, max);
    
            _read += read;
    
            // This should never happen
            if(_read > _limit)
                return -1;
                //throw new IOException("Read limit reached: " + _limit);
    
            return read;
        }
    
        @Override
        public long skip(long n)
            throws IOException
        {
            long max = Math.min((long)(_limit - _read), n);
    
            if(0 == max)
                return 0;
    
            long read = _in.skip(max);
    
            _read += read;
    
            return read;
        }
    }
    

    Using the above class to wrap the InputStream obtained from the HttpURLConnection allows me to simplify the existing code I had to read the precise number of bytes mentioned in the Content-Length header and just blindly copy input to output. I then wrap the input stream (already wrapped in the LimitedInputStream) in a GZIPInputStream to decompress, and just pump the bytes from (doubly-wrapped) input to output.

    The less-straightforward route is to pursue my original line of though: to wrap the OutputStream using (what turned out to be) an awkward class: GunzipOutputStream. I have written a GunzipOutputStream which uses an internal thread to pump bytes through a pair of piped streams. It's ugly, and it's based upon code from OpenRDF's GunzipOutputStream. I think mine is a bit simpler:

    public class GunzipOutputStream
        extends OutputStream
    {
        final private Thread _pump;
    
        // Streams
        final private PipedOutputStream _zipped;  // Compressed bytes are written here (by clients)
        final private PipedInputStream _pipe; // Compressed bytes are read (internally) here
        final private OutputStream _out; // Uncompressed data is written here (by the pump thread)
    
        // Internal state
        private IOException _e;
    
        public GunzipOutputStream(OutputStream out)
            throws IOException
        {
            _zipped = new PipedOutputStream();
            _pipe = new PipedInputStream(_zipped);
            _out = out;
            _pump = new Thread(new Runnable() {
                public void run() {
                    InputStream in = null;
                    try
                    {
                        in = new GZIPInputStream(_pipe);
    
                        pump(in, _out);
                    }
                    catch (IOException e)
                    {
                        _e = e;
                        System.err.println(e);
                        _e.printStackTrace();
                    }
                    finally
                    {
                        try { in.close(); } catch (IOException ioe)
                        { ioe.printStackTrace(); }
                    }
                }
    
                private void pump(InputStream in, OutputStream out)
                    throws IOException
                {
                    long count = 0;
    
                    byte[] buf = new byte[4096];
    
                    int read;
                    while ((read = in.read(buf)) >= 0) {
                        System.err.println("===> Pumping " + read + " bytes");
                        out.write(buf, 0, read);
                        count += read;
                    }
                    out.flush();
                    System.err.println("===> Pumped a total of " + count + " bytes");
                }
            }, "GunzipOutputStream stream pump " + GunzipOutputStream.this.hashCode());
    
            _pump.start();
        }
    
        public void close() throws IOException {
            throwIOException();
            _zipped.close();
            _pipe.close();
            _out.close();
        }
    
        public void flush() throws IOException {
            throwIOException();
            _zipped.flush();
        }
    
        public void write(int b) throws IOException {
            throwIOException();
            _zipped.write(b);
        }
    
        public void write(byte[] b) throws IOException {
            throwIOException();
            _zipped.write(b);
        }
    
        public void write(byte[] b, int off, int len) throws IOException {
            throwIOException();
            _zipped.write(b, off, len);
        }
    
        public String toString() {
            return _zipped.toString();
        }
    
        protected void finish()
            throws IOException
        {
            try
            {
                _pump.join();
                _pipe.close();
                _zipped.close();
            }
            catch (InterruptedException ie)
            {
                // Ignore
            }
        }
    
        private void throwIOException()
            throws IOException
        {
            if(null != _e)
            {
                IOException e = _e;
                _e = null; // Clear the existing error
                throw e;
            }
        }
    }
    

    Again, this works, but it seems fairly ... fragile.

    In the end, I re-factored my code to use the LimitedInputStream and GZIPInputStream and didn't use the GunzipOutputStream. If the Java API provided a GunzipOutputStream, it would have been great. But it doesn't, and without writing a "native" gunzip algorithm, implementing your own GunzipOutputStream stretches the limits of propriety.