I want to use Spark REST API to get metrics and publish to cloud watch. But the RESR API is like :
val url = "http://<host>:4040/api/v1/applications/<app-name>/stages"
If I give the master host and app id it works but how can I use this in a job and figure our master host and app-name dynamically ? Is there any way to get those information ?
Using Spark 2.1
Tried :
import org.apache.spark.sql.SparkSession
val id = spark.sparkContext.applicationId val url = spark.sparkContext.uiWebUrl.get
case class SparkStage(name: String, shuffleWriteBytes: Long, memoryBytesSpilled: Long, diskBytesSpilled: Long)
val path = url + "/api/v1/applications/" + id + "/stages"
implicit val formats = DefaultFormats
val json = fromURL(path).mkString
val stages: List[SparkStage] = parse(json).extract[List[SparkStage]]
I am getting :
java.io.IOException: Server returned HTTP response code: 500 for URL:
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1876)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.URL.openStream(URL.java:1045)
at scala.io.Source$.fromURL(Source.scala:141)
at scala.io.Source$.fromURL(Source.scala:131)
... 64 elided
If you know the host you can query applications
and parse the result to get applicaiton id.
To get applicationId
and host
from the application use respective SparkContext
val spark: SparkSession