Search code examples
rdatabricksfacebook-prophet

Installing Remote R Package to Databricks Cluster Rather than Notebook


I am trying to install the prophet package to Databricks. I want to install it directly to my cluster rather than my notebook. Below is the following code to install it to the notebook:

Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
remotes::install_github("jeroen/V8")
devtools::install_version("rstantools", version = "2.0.0")
install.packages('prophet')

However, I want to download it directly to my cluster. How would I add this snippet of code to install the prophet package to my Databricks cluster?

Here are the options I see when attempting to install a package to a cluster:

enter image description here

Attempt at downloading directly to cluster:

Command 1

%python
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")

Command 2

%python
dbutils.fs.put("/databricks/scripts/prophet_install_script.R","""
Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
remotes::install_github(\"jeroen/V8\")
devtools::install_version(\"rstantools\", version = \"2.0.0\")
install.packages('prophet')
""", True)

Command 3

%python
dbutils.fs.put("/databricks/scripts/stock_cluster_init_script_v1.sh","""
#!/bin/bash
R CMD BATCH /dbfs/databricks/scripts/prophet_install_script.R
""", True)

Then I went to my new cluster and ran it with this init script:

enter image description here enter image description here

It then provided me the following error:

{
  "reason": {
    "code": "INIT_SCRIPT_FAILURE",
    "type": "CLIENT_ERROR",
    "parameters": {
      "instance_id": "i-0c71b23287fb81530",
      "databricks_error_message": "Cluster scoped init script dbfs:/databricks/scripts/stock_cluster_init_script_v1.sh failed: Script exit status is non-zero"
    }
  }
}

Solution

  • If you aren't on the community edition, then you can use the cluster init script to perform this installation (you can install other libraries there as well).

    Just put R commands into a file on DBFS (see linked docs to see how to use dbutils.fs.put for that - you also need to explicitly set CRAN mirror):

    local({r <- getOption("repos")
           r["CRAN"] <- "http://cran.r-project.org" 
           options(repos=r)
    })
    Sys.setenv(DOWNLOAD_STATIC_LIBV8 = 1)
    remotes::install_github(\"jeroen/V8\")
    devtools::install_version(\"rstantools\", version = \"2.0.0\")
    install.packages('prophet')
    

    and then create init script with following content:

    #!/bin/bash
    
    Rscript --verbose  /dbfs/<path-to-file>
    

    please note that <path-to-file> should be withouth dbfs: