Search code examples
ramazon-emrapache-zeppelin

How can I install the R interpreter for Zeppelin 0.7.3 on Amazon EMR 5.16.0


When I create an EMR cluster with release emr-5.16.0 and include Zeppelin, it installs R along with it, however I cannot seem to load the interpreter. Even after I run "sudo bash bin/install-interpreter.sh -a" it does not show up.


Solution

  • Finally figured it out. As of 5.16.0, the EMR does not support R in Zeppelin out of the box as documented here.

    I was able to build Zeppelin from source with what I needed by setting up EMR without Zeppelin included and running the following while SSH'd into the main node:

    sudo yum -y update
    sudo yum -y install R R-devel libcurl-devel openssl-devel git
    sudo R -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
    sudo R -e "install.packages('sparklyr', repos = 'http://cran.us.r-project.org')"
    sudo R -e "install.packages('evaluate', repos = 'http://cran.us.r-project.org')"
    sudo R -e "install.packages('knitr', repos = 'http://cran.us.r-project.org')"
    sudo R -e "install.packages('ggplot2', repos = 'http://cran.us.r-project.org')"
    sudo R -e "install.packages(c('devtools','mplot', 'googleVis'), repos = 'http://cran.us.r-project.org');
    require(devtools); install_github('ramnathv/rCharts')"
    
    
    mkdir build
    cd build
    wget http://www.eu.apache.org/dist/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
    sudo tar -zxf apache-maven-3.3.9-bin.tar.gz -C /usr/local/
    sudo ln -s /usr/local/apache-maven-3.3.9/bin/mvn /usr/local/bin/mvn
    
    git clone https://github.com/apache/zeppelin.git
    cd zeppelin
    git checkout tags/v0.8.0
    
    mvn clean package -DskipTests -Pscala-2.11 -Pr -Dspark.version=2.2.0 -DHadoop.version=2.7.7
    #NEED TO CHANGE PORT IN CONFIG
    #need to set export SPARK_HOME=/usr/lib/spark in zeppelin-env.sh
    
    ./bin/zeppelin-daemon.sh start