Search code examples
pythonapache-sparkpysparkhadoop-yarn

Can't find directory spark using putty


I was trying to run a spark code on hdfs using putty.

spark-submit WorstMoviesSpark.py

But when I typed the code above, it returned an error:

python: can't open file '/home/maria_dev/WorstMoviesSpark.py': [Errno 2] No such file or directory

*edit: I was just being stupid. I loaded my code on local instead of hdp. useful answer though


Solution

  • You do not have the required permissions on the code file to execute it via Spark. run the following command hdfs dfs -chmod 777 WorstMoviesSpark.py then in your spark-submit command mention the master as yarn when running the code as follows

    spark-submit --master yarn --deploy-mode client /hdfs/path/to/WorstMoviesSpark.py