I have a Spark streaming application running in a yarn-cluster mode reading from a Kafka topic.
I want to connect JMXConsole
or the Java visualvm
to these remote processes in a Cloudera distribution to gather some performance benchmarks.
How would I go about doing that?
The way I've done this is to set/add the following property (Also start Flight Recorder):
spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=0
If you have only one worker running on each box, you can set the port to be fixed. If you have multiple, then you need to go with port 0 and the use lsof to find which port got assigned,.