Search code examples
javaparallel-processingcompressionbzip2apache-commons-compress

Parallel BZip2 Compression


i'm using Apache Commons Compress for Java to compress multiple log files to a single tar.bz2 archive.

However, it takes really long (> 12 hours) to compress, because i compress around 20GB of files a day.

As this library compresses files mono-threaded, i'd like to know if there is a way to do this multi-threaded.

I found many solutions (Commandline pbzip2 or some C++ libraries) but all i found for java is this blog post:

https://plus.google.com/117421466255362255970/posts/3jfKVu325zh

It seems that i can't use it in my Java application.

Is there anything out there? What would you recommend? Or is there another faster solution with similar compression rates like bzip2 ?


Solution

  • As you have multiple files, you can compress each file in a different thread. As your process is CPU bound, I suggest creating a fixed size thread pool i.e. an ExecutorService, and adding a task for each file to compress.

    Note: if pbzip2 does what you want, I would call it from Java. You might find it is fast for even one thread as the BZIP2 libraries I have seen for Java are natively implemented (unlike JAR, ZIP and GZIP)