Search code examples
hadoopmapreducehadoop-yarnavrosnappy

mapreduce job not setting compression codec correctly


Hi I have a MR2 job which takes avro data compressed with snappy as input, processes it and outputs the data into an output dir into avro format. The expectation is that this output avro data should also be snappy compressed but its not. The MR job is a map only job.

I have set the following properties in my code

conf.set("mapreduce.map.output.compress", "true"); conf.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");

But still the output is not snappy compressed


Solution

  • The following did the trick FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.SnappyCodec.class);

    Please note that this has do be done before setting the outputpath and in the same order as shown above.