Here is by pig script which I am using to read, filter and then compress data in bzip2 but I am getting .deflate files instead of .bz2.
set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.BZip2Codec;
inputFile = LOAD '/dl/myfolder/' using PigStorage('|') AS (col1,col2,col3,col4,col5,clo6,col7,col8,col9,col10);
filteredFile = FILTER inputFile BY col7 is not null;
store filteredFile into '/dl/myfolder/compressdata/' USING PigStorage('|');
Output file /dl/myfolder/compressdata/part-m-00000.deflate
Thanks for your help..
Regret could not update as this issue we faced when upgrading from pivotal to hortonworks. And found out that mapreduce.map.output.compress = true mapreduce.output.fileoutputformat.compress = true These overrides the in session output format and always compresses. After setting them to false we got the desired output
Thanks Koji/John for your time and valuable inputs.
Koji to your suggestion we have older design and everything is using bzip2 so changing to Lzo would be possible in next upgrade :)