My goal is to access Azure Synapse Analytics from Azure Databricks. The first thing that came in mind is to use the spark driver com.databricks.spark.sqldw
. But for that, the database user needs to be db_owner
in the database, which is not suitable, since users could mess around with Synapse. I just want users to read data from Synapse using their own Active Directory accounts.
My second shot then is to try using ODBC driver (or JDBC) to access Synapse as we normally do in local Python scripts. The problem is that our Databricks clusters have no internet access, so we can't just run apt-get
like commands (in order to install the ODBC drivers).
So, any of these questions may help me to solve the problem:
How do I copy a file from Azure Storage Account Gen2 to the local databricks cluster file system? I put the ODBC Driver 17 for SQL Server (msodbcsql17_17.10.1.1-1_amd64.deb) in a Container in Storage Account and I can see it using dbutils
. But, can I copy that file to the databricks cluster filesystem?
Is it possible to use the default spark driver com.databricks.spark.sqldw
for accessing Azure Synapse Analytics with SELECT
permission only?
For anyone wondering why I'm not using the Synapse Apache Spark Pool is because I can't run queries like SELECT * FROM A INNER JOIN B...
. And of couse, the Databricks UI is much better ; )
Thanks for any help.
Generically answering to my own question of "how do I access a file from Azure Storage Acount Gen2 using the Databricks cluster filesystem" is quite simple. Just had to mount it using dbutils
and the mount will automatically appears in /dbfs/mnt
. For instance (python script):
dbutils.fs.mount(
source = "wasbs://<container>@<storage>.blob.core.windows.net"
, mount_point = "/mnt/my_adlsgen2"
, extra_configs = {"fs.azure.account.key.<storage>.blob.core.windows.net":"account key"})
and the mount will be at:
/dbfs/mnt/my_adlsgen2
Finally, you can go to the above folder and list files or do whatever you want with it.
In order to install the ODBC Driver that is in the ADLS Gen2, I had to open a new cell and run:
%sh
echo msodbcsql17 msodbcsql/ACCEPT_EULA boolean true | sudo debconf-set-selections
sudo dpkg -i /dbfs/mnt/my_adlsgen2/msodbcsql17_17.10.1.1-1_amd64.deb
And checking if it installed ok:
%sh
odbcinst -j
unixODBC 2.3.6
DRIVERS............: /etc/odbcinst.ini
SYSTEM DATA SOURCES: /etc/odbc.ini
FILE DATA SOURCES..: /etc/ODBCDataSources
USER DATA SOURCES..: /root/.odbc.ini
SQLULEN Size.......: 8
SQLLEN Size........: 8
SQLSETPOSIROW Size.: 8
and lastly:
%sh cat /etc/odbcinst.ini
[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
UsageCount=1