Search code examples
apache-sparkhadoop-yarnsparkr

Getting application ID from SparkR to create Spark UI url


From the SparkR shell, I'd like to generate a link to view the Spark UI while in Yarn mode. Normally the Spark UI is at port 4040, but in Yarn mode apparently it is at something like [host]:9046/proxy/application_1234567890123_0001/, where the last part of the path is the unique applicationId.

Other SO answers show how to get the applicationID for the Scala and Python shells. How do we get the applicationID from SparkR?

As a stab in the dark I tried SparkR:::callJMethod(sc, "applicationId"), but it didn't work.

I also tried something along the lines of system("yarn application -list"), but that doesn't seem to work from RStudio and has other limitations.


Solution

  • You can directly follow the link from the YARN web UI to get to the Spark UI. From the YARN web UI at port 8088 you can click on 'Running Applications' and that should show you a link to the Application status page.

    If you want to use callJMethod to get the application id you can use something like SparkR:::callJMethod(SparkR:::callJMethod(sc, "sc"), "applicationId").

    The reason we need this nested call to sc is because sc is a JavaSparkContext handle and applicationId is only available in the Scala SparkContext.