Search code examples
azuredatabricksazure-databrickspyodbc

install the 'ODBC Driver drive and the Databricks cluster


I had the init script on my Cluster point to DBFS path. Error showed me that DBFS is not supported anymore and I had to move it workspace or use ABFSS path. I moved to workspace

enter image description here

pyodbs.sh looks like this

#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 

sudo apt-get update
dpkg --configure -a
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17

/databricks/python/bin/pip list | egrep 'thrift-sasl|sasl'
/databricks/python/bin/pip install --upgrade thrift
dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
sudo apt-get -y install libsasl2-dev gcc 

sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
/databricks/python/bin/pip install pyodbc

and im running this command from notebook

%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list 
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17 
apt-get -y install unixodbc-dev
pip3 install --upgrade pyodbc

this is the error I get

enter image description here

Can anyone help me to fix this? It start failing right after i changed the init scrip path

MY Cluster config enter image description here


Solution

  • I have tried the below:

    !curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
    !curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
    !sudo apt-get update
    !sudo ACCEPT_EULA=Y apt-get install -q -y msodbcsql17
    !pip install --upgrade pyodbc
    

    In the above code I am add and verifying Microsoft GPG key for package.

    Next adding SQL Server repository for Ubuntu 16.04.

    Updating the package list & Installing the ODBC Driver 17 for SQL Server.

    Results:

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   983  100   983    0     0  10516      0 --:--:-- --:--:-- --:--:-- 10569
    Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
    OK
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100    79  100    79    0     0    888      0 --:--:-- --:--:-- --:--:--   897
    deb [arch=amd64] https://packages.microsoft.com/ubuntu/16.04/prod xenial main
    Hit:1 https://packages.microsoft.com/ubuntu/16.04/prod xenial InRelease
    Hit:2 https://repos.azul.com/zulu/deb stable InRelease
    Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease               
    Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease                         
    Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
    Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
    Reading package lists... Done
    W: https://packages.microsoft.com/ubuntu/16.04/prod/dists/xenial/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
    W: https://repos.azul.com/zulu/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
    Reading package lists...
    Building dependency tree...
    Reading state information...
    msodbcsql17 is already the newest version (17.8.1.1-1).
    0 upgraded, 0 newly installed, 0 to remove and 86 not upgraded.
    1 not fully installed or removed.
    After this operation, 0 B of additional disk space will be used.
    Setting up msodbcsql17 (17.8.1.1-1) ...
    dpkg: error processing package msodbcsql17 (--configure):
     installed msodbcsql17 package post-installation script subprocess returned error exit status 127
    Errors were encountered while processing:
     msodbcsql17
    E: Sub-process /usr/bin/dpkg returned an error code (1)
    Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
    Requirement already satisfied: pyodbc in /databricks/python3/lib/python3.10/site-packages (4.0.32)
    Collecting pyodbc
      Downloading pyodbc-5.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334 kB)
         ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 334.7/334.7 kB 1.8 MB/s eta 0:00:00
    Installing collected packages: pyodbc
      Attempting uninstall: pyodbc
        Found existing installation: pyodbc 4.0.32
        Not uninstalling pyodbc at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753
        Can't uninstall 'pyodbc'. No files were found to uninstall.
    Successfully installed pyodbc-5.1.0
    
    Name: pyodbc Version: 5.1.0 Summary: DB API module for ODBC Home-page: [https://github.com/mkleehammer/pyodbc](https://github.com/mkleehammer/pyodbc) Author: Author-email: Michael Kleehammer <[email protected]> License: MIT License Location: /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753/lib/python3.10/site-packages Requires: Required-by:
    
    WARNING: apt does not have a stable CLI interface. Use with caution in scripts. msodbcsql17/xenial,now 17.8.1.1-1 amd64 [installed]
    

    I agree with @Ganesh Chandrasekaran you can use the pyspark If you are connecting to SQL Server, you can bypass ODBC and use a direct connection from Databricks instead.

    Here is the example:

    jdbcHostname = "<Your Sql server>.database.windows.net"
    jdbcPort = 1433
    jdbcDatabase = "db02"
    jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase}"
    connectionProperties = {
        "user": "admin02",
        "password": "Welcome@1"
    }
    remote_table = (spark.read
      .format("jdbc")  
      .option("url", jdbcUrl)  
      .option("user", connectionProperties["user"])
      .option("password", connectionProperties["password"])
      .option("dbtable", "dbo.tbl02")  
      .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")  
      .load()
    )
    remote_table.show()
    

    Results:

    +---+-------+
    | id|   Name|
    +---+-------+
    |  1|  dilip|
    |  2|    Raj|
    |  3|Narayan|
    +---+-------+