Search code examples
javamavenhadoopmapreduceamazon-emr

MapReduce on EMR does not contact RMProxy and gets stuck waiting for resourcemanager?


I'm running mapreduce/hadoop on EMR using hadoop 2.7.3. Stock install off of AWS, and the jar was built with maven shade plugin. It gets infinitely stuck waiting for the ResourceManager, but I've found absolutely nothing in the logfiles or on line.

In job.waitForCompletion, it comes to a line of the following:

020-01-25 05:52:41,346 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl (main): Timeline service address: http://ip-172-31-13-41.us-west-2.compute.internal:8188/ws/v1/timeline/
2020-01-25 05:52:41,356 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-13-41.us-west-2.compute.internal/172.31.13.41:8032

Then it just sits there... never makes progress, and the cluster has to be shut down or the task manually killed.

Interestingly, by running hadoop jar <arguments>, I can reproduce this step locally, but I have no idea what is causing it.

After 25 minutes or so, it fails while unpacking the jar:

After 25 minutes or so, the job produces output of the form:


AM Container for appattempt_1580058321574_0005_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://192.168.2.21:8088/cluster/app/application_1580058321574_0005Then, click on links to logs of each attempt.
Diagnostics: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
java.io.FileNotFoundException: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:297)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt

This happens both on AWS EMR and locally. Never seen this error, and using EMR straight out of the box.

Any ideas as to why this would happen? Bad Jar? Potentially related to another unanswered question here


Solution

  • After exhaustively trying hundreds of experiments, it appears that the offending line was

    job.setJar().

    Why, I do not know. It works fine under intellij, but reliably crashes using the hadoop command both locally and under intellij.