Search code examples
hadoopmapreducehbase

MultitableInput MapReduce with ResultSerialization


I understand that result serialization is used in case I use a single input table.

TableMapReduceUtil.initTableMapperJob( tableName, scan, Mapper.class, Text.class, Result.class, job );

Any ideas how I can achieve the same while using MultiTableInput (multiple scans as input)?

TableMapReduceUtil.initTableMapperJob( scans, SummaryMapper.class, Text.class, Result.class, job );

I get the following error while running the MR job:

INFO mapreduce.Job: Task Id : attempt_1492475015807_0003_m_000003_2, Status : FAILED Error: java.lang.NullPointerException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:988) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:675) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)


Solution

  • I was able to get through this phase. I explicitly specified the ResultSerialization class in the job config (but must be done before the job instance is created).

    config.setStrings( "io.serializations", config.get( "io.serializations" ),
                        MutationSerialization.class.getName(), ResultSerialization.class.getName(),
                        KeyValueSerialization.class.getName() );
    

    This his how it is specified when the mapper is initialized when using a single input table. I will be sure to update this answer if I have any further findings on this so that it may be useful for others who may need it.