Search code examples
apache-sparkjvm-argumentspyspark

Specify options for the jvm launched by pyspark


How /where are the jvm options used by the pyspark script when launching the jvm it connects to specified?

I am specifically interested in specifying jvm debugging options e.g.

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Thanks.


Solution

  • pyspark uses the bin/spark-class script to start the client that you see running in your terminal / console. You can just append whatever options necessary to JAVA_OPTS:

    JAVA_OPTS="$JAVA_OPTS -Xmx=2g -Xms=1g -agentlib:jdwp=transport=dt_socket,server=y..."