Search code examples
javagzipzcat

Combining compressed Gzipped Text Files using Java


my question might not be entirely related to Java but I'm currently seeking a method to combine several compressed (gzipped) textfiles without the requirement to recompress them manually. Lets say I have 4 files, all text that is compressed using gzip and want to compress these into one single *.gz file without de + recompressing them. My current method is to open an InputStream and parse the file linewise, storing in a GZIPoutputstream, which works but isn't very fast.... I could of course also call

    zcat file1 file2 file3 | gzip -c > output_all_four.gz

This would work, too but isn't really fast either.

My idea would be to copy the inputstream and write it to outputstream directly without "parsing" the stream, as I don't need to manipulate anything actually. Is something like this possible?


Solution

  • Find below a simple solution in Java (it does the same as my cat ... example). Any kind of buffering the input/output has been omitted to keep the code slim.

    public class ConcatFiles {
    
        public static void main(String[] args) throws IOException {
            // concatenate the single gzip files to one gzip file
            try (InputStream isOne = new FileInputStream("file1.gz");
                    InputStream isTwo = new FileInputStream("file2.gz");
                    InputStream isThree = new FileInputStream("file3.gz");
                    SequenceInputStream sis =  new SequenceInputStream(new SequenceInputStream(isOne, isTwo), isThree);
                    OutputStream bos = new FileOutputStream("output_all_three.gz")) {
                byte[] buffer = new byte[8192];
                int intsRead;
                while ((intsRead = sis.read(buffer)) != -1) {
                    bos.write(buffer, 0, intsRead);
                }
                bos.flush();
            }
    
            // ungezip the single gzip file, the output contains the
            // concatenated input of the single uncompressed files 
            try (GZIPInputStream gzipis = new GZIPInputStream(new FileInputStream("output_all_three.gz"));
                    OutputStream bos = new FileOutputStream("output_all_three")) {
                byte[] buffer = new byte[8192];
                int intsRead;
                while ((intsRead = gzipis.read(buffer)) != -1) {
                    bos.write(buffer, 0, intsRead);
                }
                bos.flush();
            }
        }
    }