Search code examples
apache-sparkhadoop-yarnemramazon-cloudwatch

How to get cluster information to call REST API (from the driver)?


I want to use Spark REST API to get metrics and publish to cloud watch. But the RESR API is like :

 val url = "http://<host>:4040/api/v1/applications/<app-name>/stages"

If I give the master host and app id it works but how can I use this in a job and figure our master host and app-name dynamically ? Is there any way to get those information ?

Using Spark 2.1

Tried :

import org.apache.spark.sql.SparkSession

val id = spark.sparkContext.applicationId val url = spark.sparkContext.uiWebUrl.get

  case class SparkStage(name: String, shuffleWriteBytes: Long, memoryBytesSpilled: Long, diskBytesSpilled: Long)
val path = url + "/api/v1/applications/" + id  + "/stages"

implicit val formats = DefaultFormats
val json = fromURL(path).mkString
val stages: List[SparkStage] = parse(json).extract[List[SparkStage]]

I am getting :

java.io.IOException: Server returned HTTP response code: 500 for URL: http://112.21.2.151:4040/api/v1/applications/application_1515337161733_0001
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1876)
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
  at java.net.URL.openStream(URL.java:1045)
  at scala.io.Source$.fromURL(Source.scala:141)
  at scala.io.Source$.fromURL(Source.scala:131)
  ... 64 elided

Solution

  • If you know the host you can query applications endpoint:

    http://localhost:4040/api/v1/applications
    

    and parse the result to get applicaiton id.

    To get applicationId and host from the application use respective SparkContext methods:

    val spark: SparkSession
    
    spark.sparkContext.applicationId
    spark.sparkContext.uiWebUrl