Search code examples
javagzipoutputstream

What order should I use GzipOutputStream and BufferedOutputStream


Can anyone recommend whether I should do something like:

os = new GzipOutputStream(new BufferedOutputStream(...));

or

os = new BufferedOutputStream(new GzipOutputStream(...));

Which is more efficient? Should I use BufferedOutputStream at all?


Solution

  • What order should I use GzipOutputStream and BufferedOutputStream

    For object streams, I found that wrapping the buffered stream around the gzip stream for both input and output was almost always significantly faster. The smaller the objects, the better this did. Better or the same in all cases then no buffered stream.

    ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
    oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));
    

    However, for text and straight byte streams, I found that it was a toss up -- with the gzip stream around the buffered stream being only slightly better. But better in all cases then no buffered stream.

    reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
    writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));
    

    I ran each version 20 times and cut off the first run and averaged the rest. I also tried buffered-gzip-buffered which was slightly better for objects and worse for text. I did not play with buffer sizes at all.


    For the object streams, I tested 2 serialized object files in the 10s of megabytes. For the larger file (38mb), it was 85% faster on reading (0.7 versus 5.6 seconds) but actually slightly slower for writing (5.9 versus 5.7 seconds). These objects had some large arrays in them which may have meant larger writes.

    method       crc     date  time    compressed    uncompressed  ratio
    defla   eb338650   May 19 16:59      14027543        38366001  63.4%
    

    For the smaller file (18mb), it was 75% faster for reading (1.6 versus 6.1 seconds) and 40% faster for writing (2.8 versus 4.7 seconds). It contained a large number of small objects.

    method       crc     date  time    compressed    uncompressed  ratio
    defla   92c9d529   May 19 16:56       6676006        17890857  62.7%
    

    For the text reader/writer I used a 64mb csv text file. The gzip stream around the buffered stream was 11% faster for reading (950 versus 1070 milliseconds) and slightly faster when writing (7.9 versus 8.1 seconds).

    method       crc     date  time    compressed    uncompressed  ratio
    defla   c6b72e34   May 20 09:16      22560860        63465800  64.5%