Search code examples
javaamazon-web-servicesnullpointerexceptionclasspathemr

Why does java.net.URL.toString throw a NullPointerException on EMR AMI 3.8.0?


My Hadoop job works fine on Amazon ElasticMapreduce AMI 3.7.0. But when I upgrade to AMI version 3.8.0, the toString method of the java.net.URL class starts throwing a NullPointerException:

java.lang.NullPointerException
  at java.net.URL.toExternalForm(URL.java:925)
  at java.net.URL.toString(URL.java:911)
  at com.snowplowanalytics.iglu.client.repositories.HttpRepositoryRef.lookupSchema(HttpRepositoryRef.scala:602)
  at com.snowplowanalytics.iglu.client.Resolver.recurse$1(Resolver.scala:236)
  at com.snowplowanalytics.iglu.client.Resolver.lookupSchema(Resolver.scala:247)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6$$anonfun$apply$7.apply(validatableJson.scala:171)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6$$anonfun$apply$7.apply(validatableJson.scala:170)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6.apply(validatableJson.scala:170)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2$$anonfun$apply$6.apply(validatableJson.scala:169)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2.apply(validatableJson.scala:169)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$$anonfun$verifySchemaAndValidate$2.apply(validatableJson.scala:166)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonMethods$.verifySchemaAndValidate(validatableJson.scala:166)
  at com.snowplowanalytics.iglu.client.validation.ValidatableJsonNode.verifySchemaAndValidate(validatableJson.scala:244)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1$$anonfun$apply$8.apply(Shredder.scala:267)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1$$anonfun$apply$8.apply(Shredder.scala:266)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1.apply(Shredder.scala:266)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$$anonfun$extractAndValidateJson$1.apply(Shredder.scala:264)
  at scala.Option.map(Option.scala:145)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractAndValidateJson(Shredder.scala:264)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.extractContexts$1(Shredder.scala:101)
  at com.snowplowanalytics.snowplow.enrich.common.utils.shredder.Shredder$.shred(Shredder.scala:108)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred$1.apply(ShredJob.scala:83)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$loadAndShred$1.apply(ShredJob.scala:80)
  at scalaz.Validation$class.flatMap(Validation.scala:141)
  at scalaz.Success.flatMap(Validation.scala:347)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$.loadAndShred(ShredJob.scala:80)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$5.apply(ShredJob.scala:170)
  at com.snowplowanalytics.snowplow.enrich.hadoop.ShredJob$$anonfun$5.apply(ShredJob.scala:169)
  at com.twitter.scalding.MapFunction.operate(Operations.scala:58)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
  at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
  at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
  at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
  at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:452)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

The URL on which the method is called is not null. The exception is thrown by the class's internal toExternalForm method.

Why does this happen?

This is the output of java -version on the cluster for AMI 3.8.0 (on both master and core nodes):

[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

And for AMI 3.7.0 (on both master and core nodes):

[hadoop@ip-xxx-xx-xx-xx ~]$ java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

Could the different JRE versions be to blame?


Solution

  • As reluctant as I am to make the claim, this appears to be a JVM bug. In the OpenJDK source for java.net.URL, the entirety of the toExternalForm() method is a delegation to a handler, which is a transient field:

    public String toExternalForm() {
        return handler.toExternalForm(this);
    }
    

    The only way in which this could throw an NPE is if handler is null. As far as I can tell, all constructor paths and the readObject(ObjectInputStream) method ensure that the handler field is set and throw exceptions (either MalformedURLException or IOException) if it can't be. For example:

    private synchronized void readObject(java.io.ObjectInputStream s)
         throws IOException, ClassNotFoundException
    {
        s.defaultReadObject();  // read the fields
        if ((handler = getURLStreamHandler(protocol)) == null) {
            throw new IOException("unknown protocol: " + protocol);
        }
    ...
    

    I note that there was a public JRE 7u79 release and would suggest trying that version if upgrading to Java 8 isn't feasible.