I add metrics.properties file to resource directory (maven project) with CSV sinc. Everything is fine when I run Spark app locally - metrics appears. But when I file same fat jar to Amazon EMR I do not see any tries to put metrics into CSV sinc. So I want to check at runtime what are loaded settings for SparkMetrics subsystem. If there are any possibility to do this?
I looked into SparkEnv.get.metricsSystem
but didn't find any.
That is basically because Spark on EMR is not picking up your custom metrics.properties
file from the resources
dir of the fat jar.
For EMR the preferred way to configure is through EMR Configurations API in which you need to pass the classification
and properties
in an embedded JSON.
spark metrics
subsystem here is an example to modify a couple of metrics [
{
"Classification": "spark-metrics",
"Properties": {
"*.sink.csv.class": "org.apache.spark.metrics.sink.CsvSink",
"*.sink.csv.period": "1"
}
}
]
You can use this JSON when creating EMR cluster using Amazon Console or through SDK