Search code examples
hadoophiverhadoop

Installing RHadoop on a Hadoop Cluster


I am trying to install RHadoop on top of my Hadoop cluster. While installing some of the required packages I am facing the following error:

> install.packages("Megh/rmr2_3.3.1.tar.gz")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Error in rawToChar(block[seq_len(ns)]) :
  embedded nul in string: 'rmr2/man/fromdfstodfs.Rd\0\0\0\0erties\n i-_". '
Warning message:
In install.packages("Megh/rmr2_3.3.1.tar.gz") :
  installation of package ‘Megh/rmr2_3.3.1.tar.gz’ had non-zero exit status
>

> install.packages("Megh/plyrmr_0.6.0.tar.gz")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
inferring 'repos = NULL' from 'pkgs'
Warning in untar2(tarfile, files, list, exdir, restore_times) :
  checksum error for entry 'plyrmr/man/as.data.framed'
Warning in readBin(con, "raw", n = 512L) :
  invalid or incomplete compressed data
Error in untar2(tarfile, files, list, exdir, restore_times) :
  incomplete block on file
Warning message:
In install.packages("Megh/plyrmr_0.6.0.tar.gz") :
  installation of package ‘Megh/plyrmr_0.6.0.tar.gz’ had non-zero exit status

I Have also installed RHive on the cluster. I'm able to execute relatively smaller queries through RHive but larger queries fail:

> rhive.query("SELECT COUNT(*) FROM tradehistory")
Error: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
> rhive.query("SELECT tradeno FROM tradehistory LIMIT 10")
    tradeno
1  34232193
2  34232198
3  34232199
4  34232200
5  34232201
6  34232202
7  34232203
8  34232204
9  34232205
10 34232206

If anybody has any idea please help me out with this! Thanks a lot in advance!


Solution

  • For installation error that I was facing, I figured out that it was an issue with the tar file.

    I downloaded that tar file using Windows system and was transferring the same to my cluster using WinSCP.

    for transferring zip/archive kind of files, ideally binary transfer should be used otherwise there are chances of some bytes of the tar file being missed out.

    This in turn, results in the error.

    In case of Tez, if a query needs to be executed which has to invoke multiple MapReduce tasks, the query can't execute without proper authorization.

    So when I tried the same rhive query with supplying the username and password, I was able to achieve the desired results.