Search code examples
azureapache-sparkazure-hdinsightlivy

Fetching the Spark Yarn log from Azure HDInsight


Currently through LIVY I am Posting/submitting spark jobs to Azure HDInsight Cluster. After job finishes I am looking into Spark History Server for yarn logs.

Livy log for each spark job is not providing yarn logs.

Can we Fetch the Spark Yarn log from Azure HDInsight programmatically. Do we have any REST call or custom tool in Azure to fetch the Yarn log


Solution

  • To investigate this issue future, could you please provide more information on the scenario:

    • How exactly you are submitting the Spark jobs to Azure HDInsight?
    • Are you following any article, if yes please do provide the link to the article, or please do share the exact steps?
    • When you launch the YARN UI from the Ambari UI, are you able to see the application_id associated when you are submitting a spark job?

    Meanwhile, you can checkout Debug Apache Spark jobs running on Azure HDInsight.

    Can we Fetch the Spark Yarn log from Azure HDInsight programmatically. Do we have any REST call or custom tool in Azure to fetch the Yarn log

    Use these APIs to submit a remote job to HDInsight Spark clusters. All task operations conform to the HTTP/1.1 protocol. Make sure you are authenticating with the Spark cluster management endpoint using HTTP basic authentication with your Spark administrator credentials.

    enter image description here

    Reference: Azure HDInsight Spark - Remote Job Submission REST API

    You can submit a GET to the livy endpoint in this format: https://<your_hdi_url>/livy/batches/<id of your job>/log

    enter image description here

    Reference: Get the full log of a batch job.