I wrote a Spark application which I compile with maven and use spark-submit to run it. I wanted to monitor my application and collect metrics. Therefore, I used a Prometheus container, but I'm struggling with exposing a simple metric to it. I tried to follow the answer here. But I didn't understand what should I do with the spark.yml file.
This is my prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['localhost:9090']
- job_name: spark-master
static_configs:
- targets: ['spark-master:8082']
When I look at the targets in http://localhost:9090/targets I can see that Prometheus target is up and Spark is down
I think the answer depends upon what you want to monitor in Spark 2.1.
If it is JVM metrics - I don't think you can do that. For the simple reason that you donot know where the JVMs will be created in the Spark cluster. If we knew that it would be impossible to launch multiple JVMs in the same node because each JMX agent would need a port to be assigned dynamically and Prometheus server needs an exact scraping url which would be impossible.
If the requirement is to measure business specific metrics using push gateway then yes you can do that because Prometheus server would be scraping a specific scraping url.
Maybe you need to look at a more recent version of Spark3.0 which supports Prometheus. Please follow this link - https://spark.apache.org/docs/latest/monitoring.html