Search code examples
linuxpython-3.xpysparkjupyter-notebooksudo

Why importing pyspark in python3 needs superuser access on my linux machine?


I installed pyspark using pip3. Whenever I try import pyspark in python3 , I get an error:

import pyspark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError:avinash@avinash-HP-ProBook-445-G1:~$ python3
Python 3.7.0 (default, Jun 28 2018, 13:15:42) 
[GCC 7.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
import pyspark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyspark'

On the other hand when I use sudo python3, everything works fine!

A similar thing happens in Jupyter notebook also, I have to do sudo jupyter notebook --allow-root to import pyspark

However, importing other packages like numpy works fine without sudo too, that too installed with pip3.

Update: I installed pyspark using sudo pip3 install pyspark , I tried uninstalling it and then installing it without sudo i.e. pip3 install pyspark but it gives error:

Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/dist-packages/pyspark-2.4.0.dist-info' Consider using the --user option or check the permissions.

Strange thing is, there is no file named 'pyspark-2.4.0.dist-info' as mentioned in error, in the directory /usr/local/lib/python3.6/dist-packages/pyspark-2.4.0.dist-info.

I also tried giving permission(777) to the above-mentioned directory.


Solution

  • Based on error you get, it seems you are using Anaconda on linux. In such a case you have to install pyspark using the command below:

    conda install -c conda-forge pyspark