I am trying to connect oracle database from AWS glue using cx_oracle but i am getting this error message
DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html for help
I tried to downoad the so files as per the document and stored it in S3 which has been linked to Glue with --extra-files parameter but still getting same error message
I have tried this stackoverflow question and also tried to set rpath with s3 url but no success. Any idea would really help
This answer is a compilation of this and this and a lot of discussion around the former in the comments. Credit for the rpath
patching solution goes to @harjeet-singh, the original author of the abovementioned answer and for the libaio
to @good-will, but there are still some steps around these solutions that keep confusing people, so that is why I am going to consolidate everything in a single step-by-step answer here.
In order to connect to an Oracle database using cx-Oracle
from a Python shell AWS Glue job, we need to bundle the oracle client libraries with it. Furthermore, the libraries have to be patched with a correct rpath
in order to load correctly, because in Glue runtime we have filesystem write access only to /tmp
, which is where our archive will be, but cx-Oracle
can't know that and expects a different directory by default. And the LD_LIBRARY_PATH
hack cannot be implemented because we have no control over how the Glue job is launched.
wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
unzip instantclient-basic-linuxx64.zip
libclntsh.so.21.1
to the one that is going to be looked up when you use cx-Oracle
: libclntsh.so
). This is done because whatever loads these libraries dynamically apparently does not resolve symlinks. Maybe it will in the future, but I had to do this to make it work.cd instantclient_21_5/
find . -type l -name "libclntsh.so*" -delete
mv libclntsh.so.21.1 libclntsh.so
Do the same with the other files with symlinks if after completing the whole guide and running your job you still have problems with libraries like
DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libsomething.so: cannot open shared object file: No such file or directory"
rpath
to point to a static directory that we are going to be using from inside the Glue job
For example, if your archive is named instant-client-basic-linux.x64-21.5.0.0.0
and it contains a folder named instantclient_21_5
with all the libraries. When the job runs, this archive is going to be available in a random directory under /tmp
(more on that below. We need to find our archive in one of those directories and extract it to a static directory under /tmp
, for example /tmp/libs
. So then, your rpath
would be /tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5
, because that in the absolute path to client libraries.sudo apt-get update
sudo apt-get install patchelf -y
patchelf --set-rpath /tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5 libclntsh.so
libaio.so.1
in the archivecd ..
wget https://src.fedoraproject.org/lookaside/pkgs/libaio/libaio-0.3.110.tar.gz/2a35602e43778383e2f4907a4ca39ab8/libaio-0.3.110.tar.gz
tar xzvf libaio-0.3.110.tar.gz
cd libaio-0.3.110
make prefix=`pwd`/usr install
find ./usr/lib/ -type l -name "libclntsh.so*" -delete
mv ./usr/lib/libaio.so.1.0.1 ../instantclient_21_5/libaio.so.1
Note: you might want to check with a newer version of libaio
if there is one.
6. Zip the archive
cd ..
zip -T -r instantclient-basic-linuxx64_patched.zip instantclient_21_5/
cx-Oracle
wheelwget https://files.pythonhosted.org/packages/a9/b7/c2d0223fb4f1013b090cf82f3ce56f36f33b79a48f9c33b36717c2977b04/cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
cx-Oracle
to S3aws s3 cp instantclient-basic-linuxx64_patched.zip s3://<mybucket>/glue_libs
aws s3 cp cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl s3://<mybucket>/glue_libs
I assume your glue job is already created and we are going to just configure it.
cx-Oracle
wheel in "Python library path" configuration parametercx-Oracle
. It can be executed after import, but before the usage.Iterate through random directories in /tmp
, find an archive you created and extract it to a static directory that we set in rpath
previously. Then initialize the cx-Oracle
client and you are good to go.
Here is an example of an implementation:
import zipfile
from pathlib import Path
import cx_Oracle
filename = 'instantclient-basic-linuxx64_patched.zip'
oracle_archive = next(Path('./tmp').glob(f'**/{filename}'))
with zipfile.ZipFile(oracle_archive, 'r') as f:
Path('./tmp/libs').mkdir()
f.extractall('./tmp/libs')
cx_Oracle.init_oracle_client(lib_dir=f'/tmp/libs/{filename}/instantclient_21_5')
Run this on your Linux machine (substitute your bucket name in the end):
wget https://download.oracle.com/otn_software/linux/instantclient/instantclient-basic-linuxx64.zip
unzip instantclient-basic-linuxx64.zip
cd instantclient_21_5/
find . -type l -name "libclntsh.so*" -delete
mv libclntsh.so.21.1 libclntsh.so
sudo apt-get update
sudo apt-get install patchelf -y
patchelf --set-rpath /tmp/libs/instant-client-basic-linux.x64-21.5.0.0.0/instantclient_21_5 instantclient_21_5/libclntsh.so
cd ..
wget https://src.fedoraproject.org/lookaside/pkgs/libaio/libaio-0.3.110.tar.gz/2a35602e43778383e2f4907a4ca39ab8/libaio-0.3.110.tar.gz
tar xzvf libaio-0.3.110.tar.gz
cd libaio-0.3.110
make prefix=`pwd`/usr install
find ./usr/lib/ -type l -name "libclntsh.so*" -delete
mv ./usr/lib/libaio.so.1.0.1 ../instantclient_21_5/libaio.so.1
cd ..
zip -T -r instantclient-basic-linuxx64_patched.zip instantclient_21_5/
wget https://files.pythonhosted.org/packages/a9/b7/c2d0223fb4f1013b090cf82f3ce56f36f33b79a48f9c33b36717c2977b04/cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl
aws s3 cp instantclient-basic-linuxx64_patched.zip s3://<mybucket>/glue_libs/
aws s3 cp cx_Oracle-8.3.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl s3://<mybucket>/glue_libs/
Follow steps 9 and 10 above to configure your job.
Hope anyone who reads this will get this right on the first try, because I sure didn't.