Search code examples
rkerneljupytercondarpy2

How to set a custom R installation for using rpy2 in Jupyter?


I have a conda environment which I made available as a kernel to my Jupyter instance by running: python -m ipykernel install --user --name my-env-name --display-name "Python (my-env-name)"

With this environment I wanted to use R in Jupyter taking advantage of rpy2's %load_ext rpy2.ipython command to enable the %%R magic. However, rpy2 is employing my global R and not the one installed in my conda environment. I checked my R home via:

%%R
R.home()

(I can also check the situation with %run -m rpy2.situation in Jupyter notebook (source), however this seems to be broken in rpy2 somewhere between versions 3.1.0 and 3.2.1 ... at least for me it was throwing UnboundLocalError: local variable 'rpy2' referenced before assignment in 3.1.0 and it was working for 3.2.1).

How can I make my Jupyter notebook load the R installation from my conda environment?


Solution

  • There are two approaches to solve this, a local (for individual Jupyter notebooks) and a global one (for the kernel itself). Both are related to setting the R_HOME environment variable.

    Local (source): Before calling %load_ext rpy2.ipython in your Jupyter notebook, run:

    import os
    os.environ['R_HOME'] = '/home/your/anaconda3/envs/myenv/lib/R' #path to your R installation
    

    Global: Find your kernel directory via: jupyter kernelspec list and edit the file kernel.json. Update the JSON by adding: "env": {"R_HOME":"/home/your/anaconda3/envs/my-env-name/lib/R"}, then restart your kernel (you might have to restart Jupyter as well).

    Update (messed up LD_LIBRARY_PATH)

    Recently, I tried running rpy2 in jupyter again after setting up a new environment using conda:

    conda config --add channels conda-forge
    conda config --set channel_priority strict
    conda create -n myenv python=3.7
    conda activate myenv
    conda install r-essentials pandas rpy2
    

    And this time I ran into the following issue when trying to either %load_ext rpy2.ipython (Jupyter) or simply import rpy2.robjects (any script):

    >>> import rpy2.robjects                                            
    Warning message:                                                    
    package ‘methods’ was built under R version 3.6.3     
    Error: package or namespace load failed for ‘stats’ in dyn.load(file, DLLpath = DLLpath, ...):
     unable to load shared object '/home/your/anaconda3/envs/myenv/lib/R/library/stats/libs/stats.so':                  
      /home/your/anaconda3/envs/myenv/lib/R/library/stats/libs/stats.so: undefined symbol: MARK_NOT_MUTABLE
    During startup - Warning messages:                                                                                          
    1: package ‘datasets’ was built under R version 3.6.3      
    2: package ‘utils’ was built under R version 3.6.3                                                                     
    3: package ‘grDevices’ was built under R version 3.6.3  
    4: package ‘graphics’ was built under R version 3.6.3                                                                       
    5: package ‘stats’ was built under R version 3.6.3          
    6: package ‘stats’ in options("defaultPackages") was not found                                                       
    R[write to console]: Error: package or namespace load failed for ‘tools’ in dyn.load(file, DLLpath = DLLpath, ...):
     unable to load shared object '/home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so':
      /home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so: undefined symbol: R_NewPreciousMSet
    
    R[write to console]: Error in dyn.load(file, DLLpath = DLLpath, ...) :
      unable to load shared object '/home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so':
      /home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so: undefined symbol: R_NewPreciousMSet
    
    R[write to console]: In addition:                                      
    R[write to console]: Warning message:                        
    
    R[write to console]: package ‘tools’ was built under R version 3.6.3
    
    Traceback (most recent call last):                          
      File "<stdin>", line 1, in <module>                    
      File "/home/your/anaconda3/envs/myenv/lib/python3.7/site-packages/rpy2/robjects/__init__.py", line 20, in <module>
        import rpy2.robjects.functions                                           
      File "/home/your/anaconda3/envs/myenv/lib/python3.7/site-packages/rpy2/robjects/functions.py", line 12, in <module>
        from rpy2.robjects import help                                   
      File "/home/your/anaconda3/envs/myenv/lib/python3.7/site-packages/rpy2/robjects/help.py", line 43, in <module>
        tools_ns = _get_namespace(StrSexpVector(('tools',)))          
      File "/home/your/anaconda3/envs/myenv/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", line 44, in _
        cdata = function(*args, **kwargs)                                     
      File "/home/your/anaconda3/envs/myenv/lib/python3.7/site-packages/rpy2/rinterface.py", line 621, in __call__
        raise embedded.RRuntimeError(_rinterface._geterrmessage())                            
    rpy2.rinterface_lib.embedded.RRuntimeError: Error in dyn.load(file, DLLpath = DLLpath, ...) :
      unable to load shared object '/home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so':
      /home/your/anaconda3/envs/myenv/lib/R/library/tools/libs/tools.so: undefined symbol: R_NewPreciousMSet
    

    The issue seemed to have been a screwed up R "situation" (check via %run -m rpy2.situation in Jupyter or simply python -m rpy2.situation on the command line), which had R's additions to LD_LIBRARY_PATH: pointing to and old, globally installed R version.

    I had to manually unset the LD_LIBRARY_PATH to solve this issue. This path can be set / unset analogously to R_HOME.

    PS: I found R_HOME and LD_LIBRARY_PATH set in my .bashrc to custom (from source) R installation. Which confused the Jupyter kernel obviously. Not smart ;)

    PPS: rpy2.situation still tells me that there is a Warning: The environment variable R_HOME differs from the default R in the PATH.:

    Looking for R's HOME:
        Environment variable R_HOME: /home/your/anaconda3/envs/myenv/lib/R
        Calling `R RHOME`: /home/your/anaconda3/envs/jupyter-env/lib/R
        Environment variable R_LIBS_USER: None
        Warning: The environment variable R_HOME differs from the default R in the PATH.
    

    Which worries me that R actually defaults to the R installed for the Jupyter installation. So if anybody has comments about this, I would be grateful.