Search code examples
pythonpandascondashared-librariesglibc

Importing pandas and cplex in a conda environment raises an ImportError: libstdc++.so.6: version `GLIBCXX_3.4.29' not found


Importing the Python libraries pandas and cplex in an conda raises the following exception:

ImportError, because 'GLIBCXX_3.4.29 not found'

When importing cplex first, there is no error (see code example below). Everything is done in a conda environment.

cplex is a propertary libary (there is a free academic version for students), but I am pretty sure that this issue is more general and can happen with any two libraries which use C++.

Many others seem to run into similar issues with other packages: the R interface to Python reticulate seems to cause the same issue often (Github tickets: #841, #1282). Also tensorflow can cause the same error. Julia seems also to be affected, but the issue is already fixed.

I think the error ImportError: /lib64/libstdc++.so.6: version 'CXXABI_1.3.9' not found might be caused by a very similar situation (see: 1, 2, 3).

The problem is always the same: a Python package which contains C++ code is used in a conda environment.

What is happening here and what is the proper solution to this situation?

Setup of a minimal example

Tested using Ubuntu 20.04.6 LTS. Pop!_OS 22.04 LTS seems not to be affected.

Create a conda environment and install pandas:

conda create -n glibcxx_test
conda activate glibcxx_test

# Python 3.10 is necessary, because cplex does not support Python 3.11 yet
mamba install -c conda-forge pandas python=3.10 pyarrow

To install cplex follow these instructions:

For quick access to CPLEX Optimization Studio through this program, go to http://ibm.biz/CPLEXonAI. Click on Software, then you'll find, in the ILOG CPLEX Optimization Studio card, a link to register. Once your registration is accepted, you will see a link to download of the AI version.

Note that after clicking the download link, you need to select "HTTP" as download method if you don't want to use the Download director. Select the version of the CPLEX Optimization Studio which suits your OS and then click download.

Make the file executable, run it and follow the instructions of the installer:

chmod +x ~/Downloads/cplex_studio2211.linux_x86_64.bin
~/Downloads/cplex_studio2211.linux_x86_64.bin

It does not seem to make a difference if the conda environment is activated before running the installer.

Note that you don't need root permissions if you install it to your home folder, e.g. /home/YOUR_USER/cplex_studio2211.

The installer will print out a command to install the Python package to access CPLEX via a Python API. Activate the conda environment and then install the cplex package:

conda activate glibcxx_test
python /home/YOUR_USER/cplex_studio2211/python/setup.py install

Error message and more debugging details

Importing pandas after cplex then raises the ImportError, mentioning that 'GLIBCXX_3.4.29 not found':

$ python -c 'import cplex; import pandas'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/api.py", line 47, in <module>
    from pandas.core.groupby import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/groupby/generic.py", line 68, in <module>
    from pandas.core.frame import DataFrame
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/frame.py", line 149, in <module>
    from pandas.core.generic import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/generic.py", line 193, in <module>
    from pandas.core.window import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/window/__init__.py", line 1, in <module>
    from pandas.core.window.ewm import (
  File "/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/core/window/ewm.py", line 11, in <module>
    import pandas._libs.window.aggregations as window_aggregations
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/aggregations.cpython-310-x86_64-linux-gnu.so)

Importing pandas first seems to work fine without error:

$ python -c 'import pandas; import cplex'  # no error!

Also setting the LD_LIBRARY_PATH explicitly seems to solve the issue:

$ LD_LIBRARY_PATH=$HOME/.conda/envs/glibcxx_test/lib/ python -c 'import cplex; import pandas'  # no error!

It seems as if pandas is linked to a newer libstdc++.so.6 library than py310_cplex2211.so:

$ ldd /home/MY_USERNAME/cplex_studio2211/cplex/python/3.10/x86-64_linux/build/lib/cplex/_internal/py310_cplex2211.so
    linux-vdso.so.1 (0x00007ffcfb94e000)
    libcplex2211.so => /home/MY_USERNAME/cplex_studio2211/cplex/python/3.10/x86-64_linux/build/lib/cplex/_internal/libcplex2211.so (0x00007f1ee3c2f000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1ee3bf9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1ee3bf3000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1ee3bd8000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ee39e6000)
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1ee3802000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1ee36b3000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f1ee617d000)
$ realpath /lib/x86_64-linux-gnu/libstdc++.so.6
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28

Pandas uses 6.0.32:

$ ldd /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/aggregations.cpython-310-x86_64-linux-gnu.so
    linux-vdso.so.1 (0x00007ffc007ec000)
    libstdc++.so.6 => /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/../../../../../libstdc++.so.6 (0x00007fef448a0000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fef4473e000)
    libgcc_s.so.1 => /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/../../../../../libgcc_s.so.1 (0x00007fef44723000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fef44531000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fef44527000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fef44add000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef44502000)
$ realpath /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/../../../../../libstdc++.so.6
/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/libstdc++.so.6.0.32

The reason seems to be that py310_cplex2211.so specifies a RUNPATH=[$ORIGIN]:

$ readelf -d /home/MY_USERNAME/cplex_studio2211/cplex/python/3.10/x86-64_linux/build/lib/cplex/_internal/py310_cplex2211.so

Dynamic section at offset 0x131d00 contains 30 entries:
  Tag        Type                         Name/Value
 [...]
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN]
 [...]

...but Pandas uses an RPATH=[$ORIGIN/../../../../..]:

$ readelf -d /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/aggregations.cpython-310-x86_64-linux-gnu.so

Dynamic section at offset 0x54790 contains 28 entries:
  Tag        Type                         Name/Value
 [...]
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../../../../..]
 [...]

Solution

  • What is happening?

    The issue arises from the conflict between the versions of libstdc++.so required by Pandas and CPLEX when both are imported into the same Python process. Pandas requires a newer version, while CPLEX loads an older version from the system path.

    Conda packages use a libstdc++.so shipped by conda:

    $ ls -al .conda/envs/glibcxx_test/lib/libstdc++.so
    lrwxrwxrwx 1 MY_USERNAME user 19 Feb  1 17:10 .conda/envs/glibcxx_test/lib/libstdc++.so -> libstdc++.so.6.0.32
    

    However, if cplex is imported first, the system's libstdc++.so is loaded, which is too old for Pandas in this case. If pandas is imported subsequently, a libstdc++.so has been loaded already and therefore the library is not loaded from the conda environment and Pandas raises the error, because some of the symbols are not found in the old library.

    Which version of a certain shared library is loaded, is determined by going through a search path at runtime (you can find here a good explanation about shared libraries and the runtime search path). Executables and shared libraries can contain the RUNPATH or the RPATH (defined at compile time). If the executable or shared library depends on another shared library, this other shared library is searched in the RPATH, in paths defined via the environment variable $LD_LIBRARY_PATH, the RUNPATH, folders defined in /etc/ld.so.conf.d/ or in OS specific paths.

    man dlopen says:

    • (ELF only) If the calling object (i.e., the shared library or executable from which dlopen() is called) contains a DT_RPATH tag, and does not contain a DT_RUNPATH tag, then the directories listed in the DT_RPATH tag are searched.
    • If, at the time that the program was started, the environment variable LD_LIBRARY_PATH was defined to contain a colon-separated list of directories, then these are searched. (As a security measure, this variable is ignored for set-user-ID and set-group-ID programs.)
    • (ELF only) If the calling object contains a DT_RUNPATH tag, then the directories listed in that tag are searched.
    • The cache file /etc/ld.so.cache (maintained by ldconfig(8)) is checked to see whether it contains an entry for filename.
    • The directories /lib and /usr/lib are searched (in that order)OS specific paths.

    Note that /etc/ld.so.cache is just a faster binary version of /etc/ld.so.conf.d/.

    Conda seems to ship all binaries (executables and shared libraries) with a RUNPATH set to the conda enviroment's lib folder:

    $ readelf -d /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/pandas/_libs/window/aggregations.cpython-310-x86_64-linux-gnu.so
    
    Dynamic section at offset 0x54790 contains 28 entries:
      Tag        Type                         Name/Value
     [...]
     0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../../../../..]
     [...]
    

    (This does not seem to be documented. The $LD_LIBRARY_PATH is not set by conda - maybe to allow running binaries from the system (e.g. bash) which are compiled to the system's libraries. /etc/ld.so.conf.d/ cannot be set for the conda environment only, as far as I understand.)

    That means, for pandas the libraries from the conda environment are loaded. However, in the case of cplex, the binaries are not specifically built for a conda environment - they were installed in the environment using the setup.py script.

    In essence, wihout further tweaking, one cannot use binaries in a conda environment, if their not built to be used that way.

    This blog post explains the error message in more detail and mentions that it usually happens if you use binaries built for a newer version of your OS. However, in our case, we are in the situation where pandas is built for the version of libstdc++ shipped by conda, and cplex actually works with the OS version, but if both Python modules are imported in the same process, the old system version is loaded.

    Possible Solutions

    Sorted by level of recommandation level: most recommended first, least recommended last.

    Install all libraries via conda

    Cplex can be simply installed via conda instead of downloading the binary package and then running setup.py:

    conda install ibmdecisionoptimization::cplex
    

    Note: This installs the community edition of cplexwhich is limited in terms of allowed problem size. It seems as if you cannot install the free student version via conda.

    This way the RPATH field is set properly and the correct version of libstd++ (the one in the conda env) is found and loaded:

    readelf -d /home/MY_USERNAME/.conda/envs/MY_ENV/lib/python3.10/site-packages/cplex/_internal/libcplex2210.so|grep RPATH
     0x000000000000000f (RPATH)              Library rpath: [$ORIGIN:$ORIGIN/../../../..]
    

    This is definitely the best solution if conda provides a package for the library you need to use.

    Note that there is also a version of Gurobi available via conda:

    mamba install gurobi::gurobi
    

    Set LD_LIBRARY_PATH

    The easiest solution is it to set the LD_LIBRARY_PATH to the lib folder of the conda environment:

    export LD_LIBRARY_PATH=/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/:$LD_LIBRARY_PATH
    

    You can also use conda, to set environment variables - see here, here and the official documentation.

    Note: As pointed out in a comment here, environment variables might be Aassed to child processes, which can cause problems if they are not compiled with the new version of libstdc++ and not backward compatible for some reason (I'm not sure why things should not be backward compatible). The next two solutions (rebuilding from source and ELF patching) are not affected by this issue.

    See also: comment on Github, SO Q&A, SO Q&A

    Rebuild binary from source and set the RUNPATH accordingly

    This seems to be the proper solution, but you need to have the source code and it might involve some hassle: re-compile all binaries, link them to the new version of libstdc++ and set the RUNPATH accordingly. This SO Q&A has an example:

    g++ main.o -o myapp ... \
       -Wl,--rpath=/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/ \
       -Wl,--dynamic-linker=/home/MY_USERNAME/.conda/envs/glibcxx_test/lib/ld-linux.so.2
    

    (This example is not a complete solution and has not been tested for this answer.)

    Re-link the binary if the source is not available

    It is possible to overwrite the RPATH in a compiled binary using patchelf:

    $ patchelf --set-rpath '$ORIGIN/../../../..'  /home/MY_USERNAME/.conda/envs/glibcxx_test/lib/python3.10/site-packages/cplex/_internal/libcplex2211.so
    

    Note that this will need to be re-done if you re-install cplex.

    Found in this SO Q&A.

    Downgrade the culprit

    libstdcxx-ng is the name of the conda package which ships libstdc++.so. One can downgrade this package to a lower version such that versions of libstdc++.so in the conda environment and the system match up. However, this will also downgrade pandas - since the most recent version of Pandas requires a newer libstdc++.so. If you don't need a recent version of Pandas, this might be an option for you:

    $ mamba install -c conda-forge libstdcxx-ng=11
    conda-forge/linux-64                                        Using cache
    conda-forge/noarch                                          Using cache
    
    Pinned packages:
      - python 3.10.*
    
    
    Transaction
    
      Prefix: /home/MY_USERNAME/.conda/envs/glibcxx_test
    
      Updating specs:
    
       - libstdcxx-ng=11
    
    
      Package         Version  Build            Channel           Size
    ────────────────────────────────────────────────────────────────────
      Downgrade:
    ────────────────────────────────────────────────────────────────────
    
      - libstdcxx-ng   12.3.0  h0f45ef3_3       conda-forge     Cached
      + libstdcxx-ng   11.4.0  h4dcbe23_3       conda-forge        3MB
      - numpy          1.26.3  py310hb13e2d6_0  conda-forge     Cached
      + numpy          1.22.3  py310h4ef5377_2  conda-forge        7MB
      - pandas          2.1.4  py310hcc13569_0  conda-forge     Cached
      + pandas          1.4.2  py310h769672d_1  conda-forge       13MB
    

    Note that it can be a bit of a hassle to find the corresponding versions. There is a separate version number for the conda package libstdcxx-ng (something like 12.3.0), for the library libstc++ (something like 6.0.30) and for gcc (something like 3.4.29 - this is part of the error message). This SO Q&A explains that the GCC release notes help to match up GCC and libstc++ library versions. This SO Q&A explains how to find which files are part of an installed conda package:

    ${CONDA_PREFIX}/conda-meta/libstdcxx-ng-13.2.0-h7e041cc_4.json
    

    There you should find an entry "files", which contains something like "lib/libstdc++.so.6.0.32". I haven't found a way to directly pick the right package without installing it.

    See also this Github comment and this Github comment.

    Upgrade your system libstdc++

    If possible, you could simply upgrade the libstdc++.so provided by your OS. However, the exact purpose of having conda environments is to be able to use certain libraries in a different version than shipped by your OS. Also, root permissions are required. At least there is a PPA for Ubuntu, to do the upgrade a bit more conveniently:

    sudo add-apt-repository ppa:ubuntu-toolchain-r/test
    sudo apt-get update
    sudo apt-get install gcc-4.9
    sudo apt-get install --only-upgrade libstdc++6
    

    See also this comment on Github and also this comment on Github and this SO Q&A

    Install the conda package libstdcxx-ng

    I don't see how this should help, but some some people report that installing the conda package libstdcxx-ng fixed their issue:

    conda install -c conda-forge libstdcxx-ng
    

    To my understanding, this can only help if a conda package dependency to libstdcxx-ng is wrong or missing. If this package is installed, it will place libstdc++.so in the conda environment's lib folder. However, this lib folder is not in the ldd search path, installing this package won't solve anything.

    Symlink the system library to the conda library

    Some people suggest to place a symlink to a newer version of libstdc++.so in the system library path /lib/x86_64-linux-gnu/. This seems to be a really bad idea, don't try this at home: Especially if the newer version of the library is located in the home directory of some user like /home/runner/.local/share/r-miniconda/pkgs/libstdcxx-ng-12.1.0-ha89aaad_16/lib/, I assume that other users will not be able to access it and consequently make the system unusable.

    However, inside a docker container it might be a suitable solution.

    See also: Github comment