Search code examples
pythonmysqlpyodbcpyarrowunixodbc

pyarrow breaks pyodbc MySQL?


I have a Docker container with MySQL ODBC driver, unixODBC, and a bunch of Python stuff installed. My MySQL driver works through isql, and it works when connecting from Python with pyodbc, if I do so in a fresh Python process:

sh-4.4# python
Python 3.8.16 (default, May 31 2023, 12:44:21)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> pyodbc.connect("DRIVER=MySQL ODBC 8.1 ANSI Driver;SERVER=host.docker.internal;PORT=3306;UID=root;PWD=shh")
<pyodbc.Connection object at 0x7f6fd94dac70>

But, if I import pyarrow before establishing the connection, I get this:

sh-4.4# python
Python 3.8.16 (default, May 31 2023, 12:44:21)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> import pyodbc
>>> pyodbc.connect("DRIVER=MySQL ODBC 8.1 ANSI Driver;SERVER=host.docker.internal;PORT=3306;UID=root;PWD=shh")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/usr/lib64/libmyodbc8a.so' : file not found (0) (SQLDriverConnect)")

I get the same if I specify the path to the driver directly:

sh-4.4# python
Python 3.8.16 (default, May 31 2023, 12:44:21)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow
>>> import pyodbc
>>> pyodbc.connect("DRIVER=/usr/lib64/libmyodbc8a.so;SERVER=host.docker.internal;PORT=3306;UID=root;PWD=shh")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/usr/lib64/libmyodbc8a.so' : file not found (0) (SQLDriverConnect)")

Having run into the same/similar error message from unixODBC in the past if a transitive dependency of the library was missing, I tried this in an attempt to see if something messed up the loader search path. Not sure if it's a valid test, but nothing seems amiss:

>>> import os
>>> os.system('./lddtree.sh /usr/lib64/libmyodbc8a.so')
libmyodbc8a.so => /usr/lib64/libmyodbc8a.soreadelf: /usr/lib64/libmyodbc8a.so: Warning: Section '.interp' was not dumped because it does not exist!
 (interpreter => none)
readelf: /usr/lib64/libmyodbc8a.so: Warning: Section '.interp' was not dumped because it does not exist!
    libpthread.so.0 => /lib64/libpthread.so.0
    libdl.so.2 => /lib64/libdl.so.2
    libssl.so.1.1 => /lib64/libssl.so.1.1
        libz.so.1 => /lib64/libz.so.1
    libcrypto.so.1.1 => /lib64/libcrypto.so.1.1
    libresolv.so.2 => /lib64/libresolv.so.2
    librt.so.1 => /lib64/librt.so.1
    libm.so.6 => /lib64/libm.so.6
    libodbcinst.so.2 => /lib64/libodbcinst.so.2
        libltdl.so.7 => /lib64/libltdl.so.7
    libstdc++.so.6 => /lib64/libstdc++.so.6
    libgcc_s.so.1 => /lib64/libgcc_s.so.1
    libc.so.6 => /lib64/libc.so.6
    ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
0

I tried upgrading pyodbc and pyarrow to latest, and behavior is the same:

pyarrow             13.0.0
pyodbc              4.0.39

I'm not sure if the issue is around pyarrow specifically, but based on this bug reporting similar behavior when importing protobuf, I searched my libraries for anything referencing 'protobuf' and pyarrow popped up with a header file including that in its name. Probably a coincidence, as that was in an older version of pyarrow and the latest version no longer even has that file.

FWIW, the container also has other ODBC drivers that don't experience this issue.

I assume pyarrow init is changing something in the environment, but I'm not enough of a Pythonista to know how to identify what; any tips to debug further?


Solution

  • In the end, this turned out to be exhaustion of the 2048 bytes allocated to TLS (Thread-Local Storage) for dynamically-loaded libraries. libarrow.so associated with pyarrow is a pig when it comes to this block of memory, and loading it prior to loading the MySQL driver via pyodbc caused libmyodbc8a.so to push usage over that limit.

    Statically preloading libarrow.so by adding it to the LD_PRELOAD environment variable resolved the issue for me. (I first tried preloading libmyodbc8a.so, but that led to some other issues I didn't bother to track down - might as well focus on pyarrow since it's the memory hog anyway.)

    (libtool is really unhelpful with diagnostics. I ended up compiling a version locally with macro LT_DEBUG_LOADERS set and running with that to get the root cause error printed to STDERR): "cannot allocate memory in static TLS block".)