Accessing the user defined modules in Pyspark Shell (ModuleNotFoundError: No module named)

Normally we do a spark-submit with the zip file spark-submit --name App_Name --master yarn --deploy-mode cluster --archives /<path>/myzip.zip#pyzip /<path>/Processfile.py and access them in the py files using from dir1.dir2.dir3.module_name import module_name and the module import works fine.

When I try to do the same in pyspark shell, it gives me a module not found error. pyspark --py-files /<path>/myzip.zip#pyzip

How can the modules be accessed in the spark shell.

Solution

Was able to finally import the modules in the Pyspark shell, the ZIP that I am passing has all the dependency modules installed into a virtual environment in Python and made as a ZIP.

So in such cases going virtual and then starting the Pyspark shell did the trick.

source bin/activate
pyspark --archives <path>/filename.zip

This didn't require me to add the pyfiles to the sparkContext too.