This defines several sinks, metrics and so on. But they are collected?
JxmSink
into metric.properties
file and enable all instance metrics (master, applications, worker, executor, driver, shuffleService, applicationMaster).Where to collect metrics: should I connect to all cluster nodes or only to driver node?
Spark metrics are not required to pull from individual nodes, if respective sink host configured in metric properties file, then metrics will be pushed to it for every configured seconds. Our setup configured to have GraphiteSink to collect the metrics, required configuration for the same as detailed below (along with others you mentioned)
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.host=<graphite-server-host>
*.sink.graphite.port=<graphite-server-port>
*.sink.graphite.period=10
*.sink.graphite.prefix=dev