Search code examples
pythonpandasapache-zeppelinspark-notebook

Apache Zeppelin Error When Importing Pandas


I'm facing a strange error when importing the Pandas library into my Zeppelin notebook. Here is the basic code that I have as part of my cell:

%python

import pandas as pd

df = pd.read_csv (r'target/youtube_videos.csv')
print (df)

I get the following Error:

Fail to execute line 3: import pandas as pd
Traceback (most recent call last):
  File "/tmp/1636039066525-0/zeppelin_python.py", line 153, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 3, in <module>
ModuleNotFoundError: No module named 'pandas'

I tried to see what my Python path looks like and here it is:

%sh
python --version
python3-config --configdir

This gives me the following:

Python 3.7.0b3
/usr/lib/python3.8/config-3.8-x86_64-linux-gnu

I'm using Zeppelin 0.10.0.

EDIT:

I tried the following:

joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ zstart
Please specify HADOOP_CONF_DIR if USE_HADOOP is true
Zeppelin start                                             [  OK  ]
joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ python
Python 3.7.0b3 (default, Mar 30 2018, 04:35:22) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'
>>> 

Pandas seems to be already installed:

joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ pip3 install pandas
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (1.3.4)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib/python3/dist-packages (from pandas) (2.7.3)
Requirement already satisfied: numpy>=1.17.3 in /usr/lib/python3/dist-packages (from pandas) (1.17.4)
Requirement already satisfied: pytz>=2017.3 in /usr/lib/python3/dist-packages (from pandas) (2019.3)
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ 

I have even set the python interpreter in Zeppelin as below:

enter image description here


Solution

  • Looks like Python interpreter used by Zeppelin doesn't configured properly. You may have several different Pythons installed and You think about one but Zeppelin uses other. You have to check parameter zeppelin.python. Then is needed to check if in this Python pandas library is installed (I think no).

    This parameter specifies "Path of the already installed Python binary. If python is not in your $PATH you can set the absolute directory (example : /usr/bin/python)"

    By default, Zeppelin will use Python defined in zeppelin.python property to run Python process. The interpreter can use all modules already installed (with pip, easy_install...)

    Than need to install pandas for interpreter used by Zeppelin.

    Or specify in this parameter path to Python interpreter where pandas is already installed.