apache-spark hadoop hadoop-yarn tail apache-spark-2.0

How to tail yarn logs?

I am submitting a Spark Job using below command. I want to tail the yarn log using application Id similar to tail command operation in Linux box.

export SPARK_MAJOR_VERSION=2
nohup spark-submit --class "com.test.TestApplication" --name TestApp --queue queue1 --properties-file application.properties --files "hive-site.xml,tez-site.xml,hbase-site.xml,application.properties" --master yarn --deploy-mode cluster Test-app.jar > /tmp/TestApp.log &

Solution

Not easily.

"YARN logs" aren't really in YARN, they are actually on the executor nodes of Spark. If YARN log aggregation is enabled, then logs are in HDFS, and available from Spark History server.

The industry deployment pattern is to configure the Spark log4j properties to write to a file with a log forwarder (like Filebeat, Splunk, Fluentd), then those processes collect data into a search engine like Solr, Elasticsearch, Graylog, Splunk, etc. From these tools, you can approximately tail/search/analyze log messages outside of a CLI.