Search code examples
ransiblehadoop-streaming

R install packages from Shell


I am trying to implement a reducer for Hadoop Streaming using R. However, I need to figure out a way to access certain libraries that are not built in R, dplyr..etc. Based on my research seems like there are two approaches:

(1) In the reducer code, install the required libraries to a temporary folder and they will be disposed when the session is done, like this:

.libPaths(c(.libPaths(), temp <- tempdir()))
install.packages("dplyr", lib=temp, repos='http://cran.us.r-project.org')
library(dplyr)
...

However, this approach will have a dramatic overhead depending on how many libraries you are trying to install. So most of the time will be wasted on installing libraries(sophisticated libraries like dplyr has tons of dependencies which will take minutes to install on a vanilla R session).

So sounds like I need to install it before hand, which leads us to approach2.

(2) My cluster is fairly big. And I have to use some tool like Ansible to make it work. So I prefer to have one Linux shell command to install the library. I have seen R CMD INSTALL... before, however, it feels like will only install packages from source file instead of doing install.packages() in R console, figure out the mirror, pull the source file, install it in one command.

Can anyone show me how to use one command line in shell to non-interactively install a R package? (sorry for this much background knowledge, if anyone thinks I am not even following the right phylosophy, feel free to leave in the comment how this whole cluster R package should be managed.)


Solution

  • You may find littler useful. It is a command-line front-end / variant of R (which uses the R-embedding interface).

    I use the install.r script all the time to install package from the shell. There is a second variant with more command-line argument parsing but it has an added dependency.