Search code examples
apache-sparkpysparkspark-submitspark-dotnet

convert spark-submit command (dotnet for spark app) to spark-submit command for python app


If the following (working) spark-submit command (for a dotnet for Spark app) was executing a python script, would it still use the same --conf settings? Given a python script name of myapp.py that has no defined function (other than main), what would the --class reference be for a python script?

/opt/spark/bin/spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner \
   --conf "spark.eventLog.enabled=true" \
   --conf "spark.eventLog.dir=file:/usr/bin/spark/hadoop/logs" \
   --master spark://spark:7077 \
   /opt/spark/jars/microsoft-spark-3-1_2.12-2.0.0.jar \
   dotnet myapp.dll "somefilename.txt"

Solution

  • For Python applications, simply pass a .py file, no need to mention the class name

    /opt/spark/bin/spark-submit \    
    --conf "spark.eventLog.enabled=true" \
    --conf "spark.eventLog.dir=file:/usr/bin/spark/hadoop/logs" \   
    --master spark://spark:7077 \
    /your python file path/myapp.py
    

    For further info, you can refer https://spark.apache.org/docs/latest/submitting-applications.html