Search code examples
amazon-web-servicesapache-sparkamazon-emrmetrics

If there are a way to get info at runtime about SparkMetrics configuration


I add metrics.properties file to resource directory (maven project) with CSV sinc. Everything is fine when I run Spark app locally - metrics appears. But when I file same fat jar to Amazon EMR I do not see any tries to put metrics into CSV sinc. So I want to check at runtime what are loaded settings for SparkMetrics subsystem. If there are any possibility to do this? I looked into SparkEnv.get.metricsSystem but didn't find any.


Solution

  • That is basically because Spark on EMR is not picking up your custom metrics.properties file from the resources dir of the fat jar.

    For EMR the preferred way to configure is through EMR Configurations API in which you need to pass the classification and properties in an embedded JSON.

    • For spark metrics subsystem here is an example to modify a couple of metrics
      [
        {
          "Classification": "spark-metrics",
          "Properties": {
            "*.sink.csv.class": "org.apache.spark.metrics.sink.CsvSink",
            "*.sink.csv.period": "1"
          }
        }
      ]
    

    You can use this JSON when creating EMR cluster using Amazon Console or through SDK