hadoop logging apache-spark cloudera hadoop-yarn

Where are logs in Spark on YARN?

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution.

The following command is used to run a spark example. But logs are not found in the history server as in a normal MapReduce job.

SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.2.1.jar \
./bin/spark-class org.apache.spark.deploy.yarn.Client --jar ./spark-example-1.0.0.jar \
--class SimpleApp --args yarn-standalone  --num-workers 3 --master-memory 1g \
--worker-memory 1g --worker-cores 1

where can I find the logs/stderr/stdout?

Is there someplace to set the configuration? I did find an output from console saying:

14/04/14 18:51:52 INFO Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster --class SimpleApp --jar ./spark-example-1.0.0.jar --args 'yarn-standalone' --worker-memory 1024 --worker-cores 1 --num-workers 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr

In this line, notice 1> $LOG_DIR/stdout 2> $LOG_DIR/stderr

Where can LOG_DIR be set?

Solution

Pretty article for this question:

Running Spark on YARN - see the section "Debugging your Application". Decent explanation with all required examples.

The only thing you need to follow to get correctly working history server for Spark is to close your Spark context in your application. Otherwise, application history server does not see you as COMPLETE and does not show anything (despite history UI is accessible but not so visible).