Search code examples
hadoopapache-sparkmapreducehadoop-yarnsqoop

Hadoop - Sqoop job stuck on ACCEPTED when there is a spark job RUNNING


At the moment I have a spark job (java) that will always need to be running. It doesn't need too many resources. However, whenever I run a sqoop job (MapReduce), the job is stuck as ACCEPTED: waiting for AM container to be allocated, launched and register with RM.

I checked Ambari and the spark config for scheduling is FAIR. For testing, I tried to run 2 of the same spark job and it ran no problems (state is RUNNING on both). There should be enough cores and memory left for the map reduce job to run.

Spark Submit command:

/usr/hdp/current/spark-client/bin/spark-submit \
  --class com.some.App \
  --master yarn-cluster \
  --deploy-mode cluster \
  --num-executors 1 \
  /path/to/file.jar "some.server:6667" "Some_App" "Some_App_Parser" "some.server"
jdbc:jtds:sqlserver://some.server:1433/HL7_Metadata
&; done

My sqoop command, I added the memory limit but it didn't help:

sqoop import -D mapreduce.map.memory.mb=2048 \
    --connect "jdbc:sqlserver://some.server\SQL2012;database=SomeDB;username=someUser;passwor =somePass" \
    --e "SELECT SOMETHING" where  \$CONDITIONS"\
    --fields-terminated-by \\002 \
    --escaped-by \\ \
    --check-column Message_Audit_Log_Id \
    --incremental append \
    --last-value 1 \
    --split-by Message_Audit_Log_Id \
    --target-dir /target/path/

Here are some images for reference: Both Accepted Running Yarn UI


Solution

  • I found help on Hortonworks.

    I had to change yarn.scheduler.capacity.maximum-am-resource-percent from 0.2 to 0.4.

    After this, i could run the sqoop map reduce job and my spark application at the same time.

    Link to answer https://community.hortonworks.com/questions/147101/hadoop-sqoop-job-stuck-on-accepted-when-there-is-a.html