Search code examples
rrhadoop

R installation on Hadoop Cluster


I'm setting up R on existing Hadoop cluster. I've so far installed R rpms and related library packages on one of the node (EDGE node) part of cluster and it works as expected. Do R rpms be installed on all servers part of cluster or just the library directory (in my case /usr/lib64/R/library) be synced up across all the servers ?


Solution

  • For rmr you need to install everywhere, for rhdfs you don't and for rhive i don't know. Install means R rpms or equivalent and necessary dependencies. As far as synching lib dirs, I've tried something similar to simplify the deployment of rmr2 but we (client and I, in agreement) pulled the plug because it seemed a very brittle strategy (depending on all the libraries to be perfectly identical). It worked in a very controlled environment, but we synched up the whole thing, not just the library.