I want to specify a default custom CRAN mirror in R under Databricks, but adjusting the config in the Rprofile.site
file seems not to bet recognized at all.
I have already read the official Microsoft documentation on how to customize the R session in Databricks:
https://learn.microsoft.com/en-us/azure/databricks/sparkr/#r-session-customization
The value of R_HOME
is /usr/lib/R
So, I have adjusted my Databricks cluster-scoped init script which adds following lines to the /usr/lib/R/etc/Rprofile.site
file:
local({
options(repos = c(CRAN = "<my_custom_cran_url>"))
})
This works perfectly fine.
However if I run getOption("repos")
within an R Notebook I get following output:
Cloud MRAN
"https://cloud.r-project.org/" "https://cran.microsoft.com/"
These are still the initial default CRAN settings.
This means, that they weren't overwritten by my custom CRAN URL in the Rprofile.site
file.
If I run the lines mentioned above (local({...repos...})
) in an R Notebook, the getOption("repos")
will output the desired entry of:
CRAN
"<my_custom_cran_url>"
Maybe the file /usr/lib/R/etc/Rprofile.site
is not executed at all although Microsoft is saying so?
Does anyone have a suggestion?
The Databricks Runtime version: 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)
TL;DR :- Use the undocumented DATABRICKS_DEFAULT_R_REPOS environment variable and set the value to a ':' delimited list of repo URLs
For example
I've also hit the same issue and I can confirm that /usr/lib/R/etc/Rprofile.site
is being executed. Setting the option with a different name in Rprofile.site will show up in the Notebook.
The issue is there's another R profile script (/local_disk0/tmp/_CleanRShell.*.r
) that is execute after Rprofile.site overwriting any repos options. Luckily this code is control by the DATABRICKS_DEFAULT_R_REPOS environment variable.