Search code examples
pysparkazure-sql-databasepyodbcazure-databricks

Using PYODBC to execute query on Azure SQL in Databricks


I'm trying to run some insert query in a Azure SQL server in a Databricks script using pyspark. I know there is the spark native connector, but ,for my understanding, the option given are just to append or overwrite the table. I need to update the table and not every row of it are affected by the update.

I tried using the python pyodbc library but when making the connection with

cnx = pyodbc.connect('DRIVER={};SERVER={};DATABASE={};Trusted_Connection=yes;user={},password={} '.format(driver,jdbcHostname_dev,jdbcDatabase_dev,sql_user,sql_password))

i receive the following error

('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found (0) (SQLDriverConnect)")

I tried the answer marked as solution from using pyodbc in azure databrick for connecting with SQL server, but this not resolve the problem for me.

How can I use pyodbc in databricks?


Solution

  • ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found (0) (SQLDriverConnect)")

    The above error mainly happens because drivers are not properly installed.

    I tried to reproduce the same in my environment and got the below results:

    enter image description here

    To resolve the above error. Please follow this approach:

    Run the following command to install ODBC drivers on the Azure Databricks cluster.

    %sh
    curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
    curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
    sudo apt-get update
    sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
    

    enter image description here

    Now, you can check Azure Databricks connected to SQL server.

    Code:

    import pyodbc
    server = '<Your_server_name>'
    database = '<database_name>'
    username = '<username>'
    password = '<password>'
        
    conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE='+ database +';UID=' + username + ';PWD='+ password)
    

    enter image description here