I'm running a Spark 3.0 application (Spark Structured Streaming) on Kubernetes and I'm trying to use the new native Prometheus metric sink. I'm able to make it work and get all the metrics described here.
However, the metrics I really need are the ones provided upon enabling the following config: spark.sql.streaming.metricsEnabled, as proposed in this Spark Summit presentation. Now, even with that config set to "true", I can't see any streaming metrics under /metrics/executors/prometheus
as advertised. One thing to note is that I can see them under metrics/json
, therefore, we know that the configuration was properly applied.
Why aren't streaming metrics sent to the Prometheus sink? Do I need to add some additional configuration? Is that not supported yet?
After quite a bit of investigation, I was able to make it work. In short, the Spark job k8s definition file needed one additional line, to tell spark where to find the metrics.propreties
config file.
Make sure to add the following line under sparkConf
in the Spark job k8s definition file, and adjust it to your actual path. The path to the metrics.properties
file should be set in your Dockerfile.
sparkConf:
"spark.metrics.conf": "/etc/metrics/conf/metrics.properties"
For reference, here's the rest of my sparkConf
, for metric-related config.
sparkConf:
"spark.metrics.conf": "/etc/metrics/conf/metrics.properties"
"spark.ui.prometheus.enabled": "true"
"spark.kubernetes.driver.annotation.prometheus.io/scrape": "true"
"spark.kubernetes.driver.annotation.prometheus.io/path": "/metrics/executors/prometheus/"
"spark.kubernetes.driver.annotation.prometheus.io/port": "4040"
"spark.sql.streaming.metricsEnabled": "true"
"spark.metrics.appStatusSource.enabled": "true"
"spark.kubernetes.driver.service.annotation.prometheus.io/scrape": "true"
"spark.kubernetes.driver.service.annotation.prometheus.io/path": "/metrics/prometheus/"
"spark.kubernetes.driver.service.annotation.prometheus.io/port": "4040"