Search code examples
apache-sparkpysparkjupyter-labapache-spark-datasetapache-spark-ml

Error when import VectorAssembler in Jupyter lab - for Pyspark


I am running this import statement

from pyspark.ml.feature import VectorAssembler

And this is the full traceback:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[5], line 1
----> 1 from pyspark.ml.feature import VectorAssembler

File /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pyspark/ml/__init__.py:22
      1 #
      2 # Licensed to the Apache Software Foundation (ASF) under one or more
      3 # contributor license agreements.  See the NOTICE file distributed with
   (...)
     15 # limitations under the License.
     16 #
     18 """
     19 DataFrame-based machine learning APIs to let users quickly assemble and configure practical
     20 machine learning pipelines.
     21 """

Solution

  • How to add Mlib library to Spark?

    This solved my issue:

    Try to do pip install numpy (or pip3 install numpy if that fails). The traceback says numpy module is not found.