Search code examples
odbcazure-databricksazure-synapseazure-storage-account

How to install ODBC Driver 17 for SQL Server on a Azure Databricks cluster with no internet access


My goal is to access Azure Synapse Analytics from Azure Databricks. The first thing that came in mind is to use the spark driver com.databricks.spark.sqldw. But for that, the database user needs to be db_owner in the database, which is not suitable, since users could mess around with Synapse. I just want users to read data from Synapse using their own Active Directory accounts.

My second shot then is to try using ODBC driver (or JDBC) to access Synapse as we normally do in local Python scripts. The problem is that our Databricks clusters have no internet access, so we can't just run apt-get like commands (in order to install the ODBC drivers).

So, any of these questions may help me to solve the problem:

  • How do I copy a file from Azure Storage Account Gen2 to the local databricks cluster file system? I put the ODBC Driver 17 for SQL Server (msodbcsql17_17.10.1.1-1_amd64.deb) in a Container in Storage Account and I can see it using dbutils. But, can I copy that file to the databricks cluster filesystem?

  • Is it possible to use the default spark driver com.databricks.spark.sqldw for accessing Azure Synapse Analytics with SELECT permission only?

For anyone wondering why I'm not using the Synapse Apache Spark Pool is because I can't run queries like SELECT * FROM A INNER JOIN B.... And of couse, the Databricks UI is much better ; )

Thanks for any help.


Solution

  • Generically answering to my own question of "how do I access a file from Azure Storage Acount Gen2 using the Databricks cluster filesystem" is quite simple. Just had to mount it using dbutils and the mount will automatically appears in /dbfs/mnt. For instance (python script):

    dbutils.fs.mount(
        source = "wasbs://<container>@<storage>.blob.core.windows.net"
        , mount_point = "/mnt/my_adlsgen2"
        , extra_configs = {"fs.azure.account.key.<storage>.blob.core.windows.net":"account key"})
    

    and the mount will be at:

    /dbfs/mnt/my_adlsgen2
    

    Finally, you can go to the above folder and list files or do whatever you want with it.

    In order to install the ODBC Driver that is in the ADLS Gen2, I had to open a new cell and run:

    %sh
    echo msodbcsql17 msodbcsql/ACCEPT_EULA boolean true | sudo debconf-set-selections
    sudo dpkg -i /dbfs/mnt/my_adlsgen2/msodbcsql17_17.10.1.1-1_amd64.deb
    

    And checking if it installed ok:

    %sh
    odbcinst -j 
    
    unixODBC 2.3.6
    DRIVERS............: /etc/odbcinst.ini
    SYSTEM DATA SOURCES: /etc/odbc.ini
    FILE DATA SOURCES..: /etc/ODBCDataSources
    USER DATA SOURCES..: /root/.odbc.ini
    SQLULEN Size.......: 8
    SQLLEN Size........: 8
    SQLSETPOSIROW Size.: 8
    

    and lastly:

    %sh cat /etc/odbcinst.ini
    
    [ODBC Driver 17 for SQL Server]
    Description=Microsoft ODBC Driver 17 for SQL Server
    Driver=/opt/microsoft/msodbcsql17/lib64/libmsodbcsql-17.10.so.1.1
    UsageCount=1