Search code examples
ru-sql

Installing R-packages in Azure Data Lake Analytics


I have an issue with installing the below R-packages and reference them in an R-script I have encapsulated in a U-SQL-script. I succeeded in running a simple R-script in a U-SQL-job that required no special packages. Now I am trying to create an R-script that references dplyr, tdyr and reshape2. Therefore I have downloaded these three packages manually as both .zip and .tar.gz-files and uploaded them to my ADL-account. Example:

../usqlext/samples/R/dplyr_0.7.7.zip

The U-SQL startes like this:

REFERENCE ASSEMBLY [ExtR];   //enable R extensions for the U-SQL Script

DEPLOY RESOURCE @"/usqlext/samples/R/dplyr_0.7.7.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/reshape2_1.4.3.zip";
DEPLOY RESOURCE @"/usqlext/samples/R/tidyr_0.8.1.zip";

The R-script starts like this:

// declare the R script as a string variable and pass it as a parameter to the Reducer:
DECLARE @myRScript = @"
install.packages('dplyr_0.7.7.zip', repos = NULL) # installing package
unzip('dplyr_0.7.7.zip')
require(dplyr)

install.packages('tidyr_0.8.1.zip', repos = NULL) # installing package
unzip('tidyr_0.8.1.zip')
require(tidyr)

install.packages('reshape2_1.4.3.zip', repos = NULL) # installing package
unzip('reshape2_1.4.3.zip')
require(reshape2)

However I keep getting errors that indicate to me that the packages are still not successfully installed. Currently I get the following error message:

Unhandled exception from user code: "Error in function_list[[i]](value) : could not find function "group_by"

That error comes from the following piece of R-code:

longStandardized <- dataset %>%
    group_by(InstallationId) %>%
    mutate(stdConsumption = znorm(tmp)) %>%
    select(InstallationId, Hournumber, stdConsumption)

Hope that someone can see what I am missing.

Thanks Jon


Solution

  • The easy way to do it, its download the file on datalake in directory: usqlext\assembly\R\MRS.9.1.0.zip

    Them you unzip the file (on a machine without R installed) and execute R.exe on bin folder.

    Now you can install all packages you want (with parameter dependencies = true)

    install.packages('yourpackage', dependencies = TRUE)
    

    Zip the folder again and replace the file on datalake by this you created.

    Execute RegisterAllAssemblies.USQL again, and your package will be available for you!

    library('yourpackage')
    

    If get not find package error, you need this trick:

    libpath = .libPaths()[1]
    install.packages('yourpackage', lib = libpath)