Normally we do a spark-submit with the zip file spark-submit --name App_Name --master yarn --deploy-mode cluster --archives /<path>/myzip.zip#pyzip /<path>/Processfile.py
and access them in the py files using from dir1.dir2.dir3.module_name import module_name and the module import works fine.
When I try to do the same in pyspark shell, it gives me a module not found error. pyspark --py-files /<path>/myzip.zip#pyzip
How can the modules be accessed in the spark shell.
Was able to finally import the modules in the Pyspark shell, the ZIP that I am passing has all the dependency modules installed into a virtual environment in Python and made as a ZIP.
So in such cases going virtual and then starting the Pyspark shell did the trick.
source bin/activate
pyspark --archives <path>/filename.zip
This didn't require me to add the pyfiles to the sparkContext too.