Search code examples
rhadoophdfsrstudiorstudio-server

unable to install rhdfs in centos


$ sudo R CMD INSTALL rhdfs
* installing to library ‘/usr/lib64/R/library’
* installing *source* package ‘rhdfs’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
  converting help for package ‘rhdfs’
    finding HTML links ... done
    hdfs-file-access                        html  
    hdfs-file-manip                         html  
    hdfs.defaults                           html  
    hdfs.file-level                         html  
    initialization                          html  
    rhdfs                                   html  
    text.files                              html  
** building package indices
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
  error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/lib64/R/library/rhdfs’

I have tried doing many round of iterations, still no go. Iam unable to install rhdfs and rmr2. I have already set HADOOP_CMD, JAVA_HOME, PATH AND installed rjava in R evironment in cludera. Iam unable to load rhdfs at all. Please help with this, or should I uninstall everything from R - R, Studio and reinstall again. Please help..

When I try installing rhdfs in R it gives me this error:

> install.packages("rhdfs")
Installing package into ‘/home/supstat/R/x86_64-unknown-linux-gnu-library/2.13’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘rhdfs’ is not available (for R version 3.1.0)

Solution

  • I faced several issues while trying to install RHadoop, and all of them had to do with rjava. Export the HADOOP_CMD and HADOOP_STREAMING variables. After that, you need to tell R library path to point to where your Java installation is:

    export LD_LIBRARY_PATH=/usr/lib/jvm/java-7-oracle/jre/lib/amd64/server
    

    Then, you need to run the following command:

    R CMD javareconf -e
    

    After that you should be able to install rhdfs and rmr2. If I remember correctly you need to install rmr2 before installing rhdfs, or maybe it was the other way around.

    EDIT: try to configure it and install from inside R:

    Sys.setenv(HADOOP_CMD="the same value you used outside R")
    Sys.setenv(HADOOP_STREAMING="same as above")
    install.packages("rhdfs_1.0.8.tar.gz", repos=NULL, type="source")