I am running a local pyspark code from command line and it works:
/Users/edamame/local-lib/apache-spark/spark-1.5.1/bin/pyspark --jars myJar.jar --driver-class-path myJar.jar --executor-memory 2G --driver-memory 4G --executor-cores 3 /myPath/myProject.py
Is it possible to run this code from Eclipse using PyDev? What are the arguments required in the Run Configuration? I tried and got the following errors:
Traceback (most recent call last):
File "/myPath/myProject.py", line 587, in <module>
main()
File "/myPath/myProject.py", line 506, in main
conf = SparkConf()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/conf.py", line 104, in __init__
SparkContext._ensure_initialized()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/context.py", line 234, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/java_gateway.py", line 76, in launch_gateway
proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Does any one have any idea? Thank you very much!
Considering the following prerequisites:
Here is what you'll need to do:
From Eclipse ID, Check that you are on the PyDev perspective:
From the Preferences window, go to PyDev > Interpreters > Python Interpreter:
I also recommend you to handle your own log4j.properties
file in each of your project.
To do so, you'll need to add the environment variable SPARK_CONF_DIR
as done previously, example:
Name: SPARK_CONF_DIR, Value: ${project_loc}/conf
If you experience some problems with the variable ${project_loc} (e.g: with Linux), specify an absolute path instead.
Or if you want to keep ${project_loc}
, right-click on every Python source and: Runs As > Run Configuration, then create your SPARK_CONF_DIR
variable in the Environment tab as described previously.
Occasionally, you can add other environment variables such as TERM
, SPARK_LOCAL_IP
and so on:
PS: I don't remember the sources of this tutorial, so excuse me for not citing the author. I didn't come up with this by myself.