Search code examples
hortonworks-data-platformrhadoop

Hortonworks Data Platform 2.1 (sandbox) unable to complete a very simple RHadoop job


I have installed rhdfs and rmr2 packages on top of Hortonworks Data Platform 2.1 (sandbox) on a 64-bit VM single node with 8 GM RAM allocated. When I tried to run the following very simple RHadoop job, it would take forever but never be able to complete (no runtime error encountered though after having increased the values of yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb from the defaults to 4096):

from.dfs(mapreduce(to.dfs(1:100)))

Appreciate any suggestion about how to make the underlying HDP complete such a very simple RHadoop job.

Just to be sure that my HDP is still working properly after RHadoop installations, I have confirmed that:

mapred job -kill job_my_rhadoop_job_id
yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar pi 16 100000
Job Finished in 70.457 seconds
Estimated value of Pi is 3.14157500000000000000

Solution

  • Since I got no luck with running RHadoop on HDP, I switched to run H2O on top of HDP, together with RStudio/R remotely connecting to H2O. Such a combination seems working fine with what I have for my VM. So avoid using RHadoop on top of HDP, in my personal view.