Search code examples
hdfsapache-pigdeflatebzip2hdp

Pig unable to create .bz2 files but creating .deflate files


Here is by pig script which I am using to read, filter and then compress data in bzip2 but I am getting .deflate files instead of .bz2.

set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;
inputFile = LOAD '/dl/myfolder/' using PigStorage('|') AS (col1,col2,col3,col4,col5,clo6,col7,col8,col9,col10);
filteredFile = FILTER inputFile BY col7 is not null;
store filteredFile into '/dl/myfolder/compressdata/' USING PigStorage('|');

Output file /dl/myfolder/compressdata/part-m-00000.deflate

Thanks for your help..


Solution

  • Regret could not update as this issue we faced when upgrading from pivotal to hortonworks. And found out that mapreduce.map.output.compress = true mapreduce.output.fileoutputformat.compress = true These overrides the in session output format and always compresses. After setting them to false we got the desired output

    Thanks Koji/John for your time and valuable inputs.

    Koji to your suggestion we have older design and everything is using bzip2 so changing to Lzo would be possible in next upgrade :)