I am working on Azure databricks and it's compute server is Ubuntu 18.04. I want to install catboost R package but without internet access because of security reasons. I downloaded github repo of catboost on my MacBook that has internet access and zipped it to upload it to Azure for manual installation. I performed following steps:
On catboost installation instructions, libc6-dev
is required so I re-installed build-essential by downloading it from this link and uploaded to ubuntu and executed following bash command to make it available: sudo dpkg -i /dbfs/FileStore/tables/build_essential_12_4ubuntu1_amd64.deb
Using my Macbook (that has internet), I cloned github repo from here and zipped using MacOS Terminal: tar czf catboost.tar.gz catboost
I uploaded catboost.tar.gz to Azure and made available in ubuntu
I unzipped it on ubuntu and executed build using: R CMD build /home/catboost_tmp/catboost
After build command, I successfully get a zip file: catboost_0.26.tar.gz. I executed following command in R to install catboost:
install.packages("catboost_0.26.tar.gz", lib = "/databricks/spark/R/lib", type = "source", repos = NULL, verbose = TRUE)
Installation results in following error:
system (cmd0): /usr/lib/R/bin/R CMD INSTALL
* installing *source* package ‘catboost’ ...
** using staged installation
checking for R_HOME... /usr/lib/R
checking for R... /usr/lib/R/bin/R
checking for local CATBOOST_DYNLIB... no
checking whether we can fetch CatBoost dynlib... downloading CatBoost (libcatboostr.so - v0.26)
trying URL 'https://github.com/catboost/catboost/releases/download/v0.26/libcatboostr-linux.so'
Error in download.file(url, dest_fpath, mode = "wb"): cannot open URL 'https://github.com/catboost/catboost/releases/download/v0.26/libcatboostr-linux.so'
Error: Stopping on error
In addition: Warning message:
In download.file(url, dest_fpath, mode = "wb") :
URL 'https://github.com/catboost/catboost/releases/download/v0.26/libcatboostr-linux.so': status was 'Couldn't connect to server'
Execution halted
*** CatBoost dynamic library download failed. stopping.
ERROR: configuration failed for package ‘catboost’
* removing ‘/databricks/spark/R/lib/catboost’
It seems that its trying to connect to github to fetch libcatboostr-linux.so, therefore, I created a new environment variable CATBOOST_DYNLIB using bash command (echo "CATBOOST_DYNLIB=/dbfs/FileStore/tables/catboost_pkg/" >> /etc/environment
) and downloaded libcatboostr-linux.so from here. But I get the same error message!
Any experience developers here who can help me with catboost (R package) installation without access to internet? Thanks for reading my question.
I solved it on my own and here is the solution for others facing similar issues. I was doing it correctly by creating an environment variable CATBOOST_DYNLIB but the path should be complete including the file name libcatboostr-linux.so . I was only using path to the directory including this file!