How to merge 2 bzip2'ed files?

I want to merge 2 bzip2'ed files. I tried appending one to another: cat file1.bzip2 file2.bzip2 > out.bzip2 which seems to work (this file decompressed correctly), but I want to use this file as a Hadoop input file, and I get errors about corrupted blocks.

What's the best way to merge 2 bzip2'ed files without decompressing them?

Solution

Handling concatenated bzip is fixed on trunk, or should be: https://issues.apache.org/jira/browse/HADOOP-4012. There are examples of it working: https://issues.apache.org/jira/browse/MAPREDUCE-477?focusedCommentId=12871993&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12871993 Make sure you're running a recent version of Hadoop and you should be fine.

Why would you use 'extern "C++"'?
Strange Behavior Compiler Ignoring NULL Check Unless I Print Something in the if Statement
Fast inverse square root using fixed point instead of floating point
What is the const qualifier attached to in C: the memory area or the pointer?
What is the scope of `fesetround()`?
Is this declaration UB?
GCC options for strictest C code?
How to do an explicit fall-through in C
How do compilers treat CONST qualifier when the pointer points to a memory location obtained with malloc()?
C: cmocka headers - how to unittest?
Why in C when I print a double with a one decimal it round it to the next number
Android C to Java SWIG unable to compile: incompatible types: byte cannot be converted to SWIGTYPE_p_uint8_t
GNU Make in Ubuntu giving fatal error: rpc/types.h: No such file or directory
How can I exclude non-numeric keys? CS50 Caesar Pset2
How change every struct in an array of pointers?
Optimized 2x2 matrix multiplication: Slow assembly versus fast SIMD
Simple frame by frame video decoder library
GCC no longer implements <varargs.h>
Contents of IO buffer unknown == unsafe?
Avoiding strcpy overflow destination warning
Sort program not working, not sure why
Fast & accurate atan/arctan approximation algorithm
What's the difference between strtok_r and strtok_s in C?
How memory address for pointer to arrays is same as an element in 2D array?
Which is the best way to suppress "unused variable" warning
How to use ellipsis in c's case statement?
Fast ceiling of an integer division in C / C++
Is there an invalid pthread_t id?
How to Implement Universal Setter/Getter Functions for Interrupt-Driven Variables in Embedded C?
How does SIMD (avx) processing work? for example, if I want 10 32 bit floats how do i fit in a 256 bit avx vector?