Search code examples
serializationhadoopmapreduceavrooozie

Avro Map-Reduce on oozie


I have been trying to run a Avro map-reduce on oozie. I specify the mapper and reducer class in the workflow.xml and provide other configs too. But it gives out an

java.lang.RunTime Exception - class mr.sales.avro.etl.SalesMapper not org.apache.hadoop.mapred.Mapper

The same job when run directly on a hadoop cluster (and not via oozie) gets completed and gives the desired output. So it seems probable that I may be missing some oozie config. What I guess from the exception is that oozie requires the mapper to be a subclass of org.apache.hadoop.mapred.Mapper but Avro mappers have a different signature - they extend org.apache.avro.mapred.AvroMapper and this may be reason for the error.

So my question is how do I confiure oozie workflow/properties file to allow it to run an Avro map-reduce job.


Solution

  • With AVRO, you'll need to configure a few extra properties:

    • org.apache.avro.mapred.HadoopMapper is the actual mapper class you need to set (this implements the Mapper interface)
    • avro.mapper property should name your SalesMapper class

    There are other properties for the combiner and reducer too - check the AvroJob source and the utility methods.

    Another way of doing this is to examine the job.xml from a job you manually submitted, and copy over the relevant configuration properties to your oozie workflow.xml