Search code examples
pythoninstallationgoogle-colaboratoryrdkit

Why do we have to append path for Rdkit in Google Colab


!chmod +x Miniconda3-py37_4.8.3-Linux-x86_64.sh
!time bash ./Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -f -p /usr/local
!time conda install -q -y -c conda-forge rdkit

import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

In this code why do we have to append the path after rdkit is installed?

i.e. sys.path.append('/usr/local/lib/python3.7/site-packages/')


Solution

  • Upon starting a Python interpreter, a list of all the directories it will use to search for modules when importing is created. You can access this from the variable sys.path. If you run this in colab you can see where Python is searching for modules.

    import sys
    sys.path
    
    >>> ['',
    '/env/python',
     '/usr/lib/python36.zip',
     '/usr/lib/python3.6',
     '/usr/lib/python3.6/lib-dynload',
     '/usr/local/lib/python3.6/dist-packages',
     '/usr/lib/python3/dist-packages',
     '/usr/local/lib/python3.6/dist-packages/IPython/extensions',
     '/root/.ipython']
    

    The issue is that conda will install packages to a directory not included in sys.path ('/usr/local/lib/python{pyversion}/site-packages/'), so Python will not be able to locate packages installed by conda. This is simple to resolve simply by appending the path to sys.path. Now Python knows where to look for the package, in this case RDKit. Notice we can validate that this is the case by looking where rdkit is installed:

    sys.path.append('/usr/local/lib/python3.7/site-packages/')
    
    import rdkit
    rdkit.__file__
    
    >>> /usr/local/lib/python3.7/site-packages/rdkit/__init__.py 
    

    Also note that the directories in sys.path are searched in order stopping when module is first found. So when installing a package through conda that is already provided by colab, the colab version will take priority.