I had the init script on my Cluster point to DBFS path. Error showed me that DBFS is not supported anymore and I had to move it workspace or use ABFSS path. I moved to workspace
pyodbs.sh looks like this
#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
dpkg --configure -a
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
/databricks/python/bin/pip list | egrep 'thrift-sasl|sasl'
/databricks/python/bin/pip install --upgrade thrift
dpkg -l | egrep 'thrift_sasl|libsasl2-dev|gcc|python-dev'
sudo apt-get -y install libsasl2-dev gcc
sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
/databricks/python/bin/pip install pyodbc
and im running this command from notebook
%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
apt-get -y install unixodbc-dev
pip3 install --upgrade pyodbc
this is the error I get
Can anyone help me to fix this? It start failing right after i changed the init scrip path
I have tried the below:
!curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
!curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
!sudo apt-get update
!sudo ACCEPT_EULA=Y apt-get install -q -y msodbcsql17
!pip install --upgrade pyodbc
In the above code I am add and verifying Microsoft GPG key for package.
Next adding SQL Server repository for Ubuntu 16.04.
Updating the package list & Installing the ODBC Driver 17 for SQL Server.
Results:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 983 100 983 0 0 10516 0 --:--:-- --:--:-- --:--:-- 10569
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 79 100 79 0 0 888 0 --:--:-- --:--:-- --:--:-- 897
deb [arch=amd64] https://packages.microsoft.com/ubuntu/16.04/prod xenial main
Hit:1 https://packages.microsoft.com/ubuntu/16.04/prod xenial InRelease
Hit:2 https://repos.azul.com/zulu/deb stable InRelease
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
W: https://packages.microsoft.com/ubuntu/16.04/prod/dists/xenial/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: https://repos.azul.com/zulu/deb/dists/stable/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
Reading package lists...
Building dependency tree...
Reading state information...
msodbcsql17 is already the newest version (17.8.1.1-1).
0 upgraded, 0 newly installed, 0 to remove and 86 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up msodbcsql17 (17.8.1.1-1) ...
dpkg: error processing package msodbcsql17 (--configure):
installed msodbcsql17 package post-installation script subprocess returned error exit status 127
Errors were encountered while processing:
msodbcsql17
E: Sub-process /usr/bin/dpkg returned an error code (1)
Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.
Requirement already satisfied: pyodbc in /databricks/python3/lib/python3.10/site-packages (4.0.32)
Collecting pyodbc
Downloading pyodbc-5.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 334.7/334.7 kB 1.8 MB/s eta 0:00:00
Installing collected packages: pyodbc
Attempting uninstall: pyodbc
Found existing installation: pyodbc 4.0.32
Not uninstalling pyodbc at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753
Can't uninstall 'pyodbc'. No files were found to uninstall.
Successfully installed pyodbc-5.1.0
Name: pyodbc Version: 5.1.0 Summary: DB API module for ODBC Home-page: [https://github.com/mkleehammer/pyodbc](https://github.com/mkleehammer/pyodbc) Author: Author-email: Michael Kleehammer <[email protected]> License: MIT License Location: /local_disk0/.ephemeral_nfs/envs/pythonEnv-9de0cbfa-9f9d-4ae7-b838-2c6b8f93e753/lib/python3.10/site-packages Requires: Required-by:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts. msodbcsql17/xenial,now 17.8.1.1-1 amd64 [installed]
I agree with @Ganesh Chandrasekaran you can use the pyspark If you are connecting to SQL Server, you can bypass ODBC and use a direct connection from Databricks instead.
Here is the example:
jdbcHostname = "<Your Sql server>.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "db02"
jdbcUrl = f"jdbc:sqlserver://{jdbcHostname}:{jdbcPort};database={jdbcDatabase}"
connectionProperties = {
"user": "admin02",
"password": "Welcome@1"
}
remote_table = (spark.read
.format("jdbc")
.option("url", jdbcUrl)
.option("user", connectionProperties["user"])
.option("password", connectionProperties["password"])
.option("dbtable", "dbo.tbl02")
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load()
)
remote_table.show()
Results:
+---+-------+
| id| Name|
+---+-------+
| 1| dilip|
| 2| Raj|
| 3|Narayan|
+---+-------+