Search code examples
javahadoopvirtualboxhadoop-streamingrhadoop

Virtual machines containing RHadoop and the hadoop-streaming.jar


Getting a local test instance of Hadoop looks like a bit of a bear to configure, after consulting the following very clear, but still very complicated references:

Are there recommended VMs that contain properly configured hadoop-streaming.jar and RHadoop?


Solution

  • First of all RHadoop is deprecated. Use rhdfs, rhbase, rmr2, plyrmr, quickcheck. AFAIK, there is no VM which has Hadoop streaming and R installed. So, pick a VM from Cloudera/HortonWorks/MapR and then install R and then then the required R packages on top of it.