Search code examples
pysparkapache-zeppelin

zeppelin unable to import pandas, numpy, scipy


Code written in zeppelin, its working and importing fine in shell when I start pyspark there but not in zeppelin with same code.

 %pyspark
import pandas

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-4245945050627073162.py", line 239, in <module>
    eval(compiledCode)
  File "<string>", line 1, in <module>
ImportError: No module named pandas

Solution

  • This is because pandas is not installed on the machine.

    If pip is not installed, first install pip.

    sudo curl --silent --show-error https://bootstrap.pypa.io/get-pip.py | python
    

    Then install pandas

    sudo pip install pandas
    

    Or use [docker] like this.

    docker run -d -p 8080:8080 -t knockdata/zeppelin-highcharts
    

    zeppelin-highcharts image include pandas and Highcharts functionality.