Search code examples
databaseoracledatabricksazure-databrickscx-oracle

cx_Oracle in Azure Databricks


I am unable to establish connection to my Oracle database from Azure Databricks although it works in ADF where I am able to query the table. But ADF takes time to filter the records so I am still trying to connect from Databricks.

I followed the steps from this Microsoft link, both manually and using init-script but error seems to persist.

When I looked into my cluster event log it says the init-script execution was successfully.

Even-logs

Error message when I tried to establish the connection: DPI-1047: Cannot locate a 64-bit Oracle Client library: "/databricks/driver/oracle_ctl//lib/libclntsh.so: cannot open shared object file: No such file or directory".

When I executed the following command

dbutils.fs.ls("/databricks/driver/")

there was no such directory

This triggered me to post some questions here:

  1. Does this mean the init-script did not perform its job?

  2. Is /databricks/driver/oracle_ctl a hidden directory for dbutils.fs.ls?

  3. Error message points to /databricks/driver/oracle_ctl//lib/libclntsh.so, when I manually inspected the downloaded oracle client, there is no such folder called lib although libclntsh.so exists in the main directory. Is there a problem that databricks is checking the wrong directory for the libclntsh.so?

  4. Does this connections still works for others?

Syntax for connection: cx_Oracle.connect(user= user_name, password= password,dsn= IP+':'+Port+'/'+DB_name)

Above syntax works fine when connected from inside a on-premises machine.


Solution

  • Changed the path from "/databricks/driver/oracle_ctl/" to "/databricks/driver/oracle_ctl/instantclient" in the init-script and that error does not appear anymore.

    Please use the following init script instead

    dbutils.fs.put("dbfs:/databricks/<init-script-folder-name>/oracle_ctl.sh","""
    #!/bin/bash
    sudo apt-get install libaio1
    wget --quiet -O /tmp/instantclient-basiclite-linuxx64.zip https://download.oracle.com/otn_software/linux/instantclient/instantclient-basiclite-linuxx64.zip
    unzip /tmp/instantclient-basiclite-linuxx64.zip -d /databricks/driver/oracle_ctl/
    mv /databricks/driver/oracle_ctl/instantclient* /databricks/driver/oracle_ctl/instantclient
    sudo echo 'export LD_LIBRARY_PATH="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
    sudo echo 'export ORACLE_HOME="/databricks/driver/oracle_ctl/instantclient/"' >> /databricks/spark/conf/spark-env.sh
    """, True)
    

    Notes:

    1. The above init-script was advised by a databricks employee and can be found here.

    2. As mentioned by Christopher Jones in one of the comments, cx_Oracle has been recently upgraded to oracledb with a thin and thick version.