When I want to import pyspark in a python script in PyCharm I get below error (cannot import x from y). I checked the directory and the module that should be imported is not present.
Thats all there is in pyspark\cloudpickle\
Why is it not installed? What could be possible problems?
Compatibility issues? I found this which looks similar, but my error says "cannot import name
"
I also found this about cloudpickle specifically, I tried with cloudpickle=1.1.1 but it didn't work for me.
I also made a new env, re-installed pyspark and rebooted, but it didn't help.
import findspark
findspark.init()
Works without error.
Obviously I'm new to Spark/PySpark and might miss the obvious...
import pyspark
Traceback (most recent call last):
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-9-d008122bb79d>", line 3, in <cell line: 3>
from pyspark.sql import Row
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\__init__.py", line 51, in <module>
from pyspark.context import SparkContext
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\context.py", line 33, in <module>
from pyspark.broadcast import Broadcast, BroadcastPickleRegistry
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.3.1\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\broadcast.py", line 25, in <module>
from pyspark.cloudpickle import print_exec
ImportError: cannot import name 'print_exec' from 'pyspark.cloudpickle' (C:\Users\me\anaconda3\envs\myenv\lib\site-packages\pyspark\cloudpickle\__init__.py)
I am working in PyCharm IDE (PyCharm Community Edition 2021.3.1)
Python 3.10.4 | packaged by conda-forge | (main, Mar 30 2022, 08:38:02) [MSC v.1916 64 bit (AMD64)]
>conda list | grep pyspark
pyspark 3.2.1
>conda info
conda version : 4.12.0
conda-build version : 3.20.5
python version : 3.8.5.final.0
Since the files were not there, I just downloaded pyspark manually from the website and replaced the previos pyspark installation with the newly downloaded one.
This got rid of all the import errors.