I want to merge 2 bzip2'ed files. I tried appending one to another: cat file1.bzip2 file2.bzip2 > out.bzip2
which seems to work (this file decompressed correctly), but I want to use this file as a Hadoop input file, and I get errors about corrupted blocks.
What's the best way to merge 2 bzip2'ed files without decompressing them?
Handling concatenated bzip is fixed on trunk, or should be: https://issues.apache.org/jira/browse/HADOOP-4012. There are examples of it working: https://issues.apache.org/jira/browse/MAPREDUCE-477?focusedCommentId=12871993&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12871993 Make sure you're running a recent version of Hadoop and you should be fine.