Search code examples
hadoopclasspathamazon-dynamodbooziefasterxml

Edit YARN's classpath in Oozie


I am trying to run a hadoop job through Oozie. The job uploads data to DynamoDB in AWS. As such, I use AmazonDynamoDBClient. I get the following exception in reducers:

2016-06-14 10:30:52,997 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
    at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:458)
    at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:379)
    at com.amazonaws.util.json.Jackson.<clinit>(Jackson.java:32)
    at com.amazonaws.internal.config.InternalConfig.loadfrom(InternalConfig.java:233)
    at com.amazonaws.internal.config.InternalConfig.load(InternalConfig.java:251)
    at com.amazonaws.internal.config.InternalConfig$Factory.<clinit>(InternalConfig.java:308)
    at com.amazonaws.util.VersionInfoUtils.userAgent(VersionInfoUtils.java:139)
    at com.amazonaws.util.VersionInfoUtils.initializeUserAgent(VersionInfoUtils.java:134)
    at com.amazonaws.util.VersionInfoUtils.getUserAgent(VersionInfoUtils.java:95)
    at com.amazonaws.ClientConfiguration.<clinit>(ClientConfiguration.java:42)
    at com.amazonaws.PredefinedClientConfigurations.dynamoDefault(PredefinedClientConfigurations.java:38)
    at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.<init>(AmazonDynamoDBClient.java:292)
    at com.mypackage.UploadDataToDynamoDBMR$DataUploaderReducer.setup(UploadDataToDynamoDBMR.java:396)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

I used a fat jar which packages all dependencies and copied the jar to Oozie's lib directory.

I have also used dependency management in pom to pin fasterxml jackson dependency to 2.4.1 (which is used by AWS dynamodb SDK). However, when the execution happens on the reducers, somehow some other version of fasterxml jackson appears first on the classpath (or so I believe).

I also excluded jackson dependency from dynamodb and aws sdks.

<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>aws-java-sdk-dynamodb</artifactId>
  <version>1.10.11</version>
  <exclusions>
    <exclusion>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>aws-java-sdk-core</artifactId>
  <version>1.10.11</version>
  <exclusions>
    <exclusion>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>*</artifactId>
    </exclusion>
  </exclusions>
</dependency>

How can I make sure that my jar is the first one on the classpath in mappers and reducers? I tried the suggestion on this page and added the following property to the job's configuration xml:

<property>
    <name>oozie.launcher.mapreduce.user.classpath.first</name>
    <value>true</value>
</property>

But this did not help.

Any suggestions?


Solution

  • Have you copied your jar into the lib folder next to the lib workflow.xml or into sharelib?

    Check what version of Jackson your Hadoop distribution is using and try to use that version of Jackson everywhere. Also, it might worth checking that no other Jackson jars are on the classpath. From the exception it looks like that Hadoop tries to call a method:

    com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering

    This method was introduced in Jackson version 2.3, so probably an even older version of Jackson is in there somewhere.