I am running this import statement
from pyspark.ml.feature import VectorAssembler
And this is the full traceback:
ModuleNotFoundError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from pyspark.ml.feature import VectorAssembler
File /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pyspark/ml/__init__.py:22
1 #
2 # Licensed to the Apache Software Foundation (ASF) under one or more
3 # contributor license agreements. See the NOTICE file distributed with
(...)
15 # limitations under the License.
16 #
18 """
19 DataFrame-based machine learning APIs to let users quickly assemble and configure practical
20 machine learning pipelines.
21 """
How to add Mlib library to Spark?
This solved my issue:
Try to do pip install numpy (or pip3 install numpy if that fails). The traceback says numpy module is not found.