I keep getting this error when I run my notebook on Google-Cloud-Data-Proc
import numpy as np
ImportError: ('No module named numpy', <function _parse_datatype_json_string at 0x7fc294e25230>.......
But don't get the error when running locally with same python 2.7
I found that version on my local is
numpy.version.version
'1.11.1'
but on google-data-proc it is older **'1.8.2' **
As mentioned in other answers ImportError: No module named numpy - Google Cloud Dataproc when using Jupyter Notebook I tried this to upgrade
import sys
sys.path.append('/usr/lib/python2.7/dist-packages')
os.system("sudo apt-get install python-pandas -y")
os.system("sudo apt-get install python-numpy -y")
os.system("sudo apt-get install python-scipy -y")
os.system("sudo apt-get install python-sklearn -y")
import pandas
import numpy
import scipy
import sklearn
I still get 1.8.2 version
pip command doesn't have permission on google-data-proc
tried pip with sudo, that too didn't work.
IOError: [Errno 13] Permission denied: '/usr/local/bin/miniconda/lib/python2.7/site-
packages/easy-install.pth'
my-user-name@cluster-name-1-m:~$ sudo pip install numpy
sudo: pip: command not found
Edit: We've now added a metadata option JUPYTER_CONDA_PACKAGES
to automatically pre-install packages through conda
during the Jupyter setup. As now covered by the examples, the preferred way to get your packages installed is with:
gcloud dataproc clusters create my-cluster \
--initialization-actions gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--metadata JUPYTER_CONDA_PACKAGES=numpy:pandas:scikit-learn:scipy
In the absence of using this metadata value, historical answer below for posterity and more internal details:
Dataproc's jupyter initialization action also installs conda
, so on your master node you can just run:
sudo su
conda install numpy
Depending on how it's used you may also need it on your worker nodes; you can customize the main jupyter.sh
script adding the line conda install numpy
anywhere after the /dataproc-initialization-actions/conda/bootstrap-conda.sh
line and re-upload your custom init action to GCS somewhere to specify that instead of gs://dataproc-initialization-actions/jupyter/jupyter.sh
to automatically install it on your deployments. Something like:
gsutil cp gs://dataproc-initialization-actions/jupyter/jupyter.sh .
echo "conda install numpy >> jupyter.sh"
gsutil cp jupyter.sh gs://my-bucket/jupyter_with_numpy.sh
gcloud dataproc clusters crreate my-cluster \
--initialization-actions gs://my-bucket/jupyter_with_numpy.sh
Finally, you can also use the built-in package manager in the Jupyter UI to browse and install conda packages: