I am trying to install conda on EMR and below is my bootstrap script, it looks like conda is getting installed but it is not getting added to environment variable. When I manually update the $PATH
variable on EMR master node, it can identify conda
. I want to use conda on Zeppelin.
I also tried adding condig into configuration like below while launching my EMR instance however I still get the below mentioned error.
"classification": "spark-env",
"properties": {
"conda": "/home/hadoop/conda/bin"
}
[hadoop@ip-172-30-5-150 ~]$ PATH=/home/hadoop/conda/bin:$PATH
[hadoop@ip-172-30-5-150 ~]$ conda
usage: conda [-h] [-V] command ...
conda is a tool for managing and deploying applications, environments and packages.
#!/usr/bin/env bash
# Install conda
wget https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh -O /home/hadoop/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p $HOME/conda
conda config --set always_yes yes --set changeps1 no
conda install conda=4.2.13
conda config -f --add channels conda-forge
rm ~/miniconda.sh
echo bootstrap_conda.sh completed. PATH now: $PATH
export PYSPARK_PYTHON="/home/hadoop/conda/bin/python3.5"
echo -e '\nexport PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc
conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
sudo pip3 install tensorflow
sudo pip3 install boto3
sudo pip3 install botocore
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install s3fs
sudo pip3 install matplotlib
sudo pip3 install -U tqdm
sudo pip3 install -U scikit-learn
sudo pip3 install -U scikit-multilearn
sudo pip3 install xlutils
sudo pip3 install natsort
sudo pip3 install pydot
sudo pip3 install python-pydot
sudo pip3 install python-pydot-ng
sudo pip3 install pydotplus
sudo pip3 install h5py
sudo pip3 install graphviz
sudo pip3 install recmetrics
sudo pip3 install openpyxl
sudo pip3 install xlrd
sudo pip3 install xlwt
sudo pip3 install tensorflow.io
sudo pip3 install Cython
sudo pip3 install ray
sudo pip3 install zoo
sudo pip3 install analytics-zoo
sudo pip3 install analytics-zoo[ray]
#sudo /usr/bin/pip-3.6 install -U imbalanced-learn
I got the conda working by modifying the script as below, emr python versions were colliding with the conda version.:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh -O /home/hadoop/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p $HOME/conda
echo -e '\n export PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc
conda config --set always_yes yes --set changeps1 no
conda config -f --add channels conda-forge
conda create -n zoo python=3.7 # "zoo" is conda environment name
conda init bash
source activate zoo
conda install python 3.7.0 -c conda-forge orca
sudo /home/hadoop/conda/envs/zoo/bin/python3.7 -m pip install virtualenv
and setting zeppelin python and pyspark parameters to:
“spark.pyspark.python": "/home/hadoop/conda/envs/zoo/bin/python3",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type":"native",
"spark.pyspark.virtualenv.bin.path":"/home/hadoop/conda/envs/zoo/bin/,
"zeppelin.pyspark.python" : "/home/hadoop/conda/bin/python",
"zeppelin.python": "/home/hadoop/conda/bin/python"
Orca only support TF upto 1.5 hence it was not working as I am using TF2.