Did anyone had luck connecting from Azure Synapse Analytics provisioned Spark Pool to Azure SQL Database?
Problem 1:
I've uploaded the Spark SQL Connector https://github.com/microsoft/sql-spark-connector as a Workspace Library and linked to the Spark Pool. Installing this causes an error when trying to start Spark Pool Session. I get a Livy error. In the Monitor section the error is :
This application failed due to the total number of errors: 1.
Error code 1
LIBRARY_MANAGEMENT_FAILED
Message
[...] Cleaning up the Spark service job because the cluster has failed.
Edit: This actually works on another Spark Pool for no reason. I don't know the root cause but I was able to run this on another pool.
Problem 2: I'm trying to use the TokenLibrary with Azure SQL Linked Service. This code:
conn = TokenLibrary.getConnectionString("MyAzureSQLDev")
print(conn)
Displays something that looks like Base64-encrypted JWT token plus some unknown characters. This is not a connection string.
I am looking for any working solution.
TokenLibrary.getConnectionString("MyAzureSQLDev")
returns the access token of the workspace identity (MSI). To use the token and write to the database I uploaded sql-spark-connector
to workspace packages and wrote this code:
df.write.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", "jdbc:sqlserver://%s.database.windows.net:%d" % (sql_server_name, db_port)) \
.option("dbtable", db_table) \
.option("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("MyAzureSQLDev")) \
.option("encrypt", "true") \
.option("databaseName", db_name) \
.option("hostNameInCertificate", "*.database.windows.net") \
.mode("append") \
.save()
The workspace identity has to be added to Azure SQL Database this way:
CREATE USER [your workspace identity] FROM EXTERNAL PROVIDER;
However...
...the currently released version of sql-spark-connector
(version 1.0.1 from November 2020; see here) is not compatible with the current version of Spark 2.4 used by Azure Synapse Analytics. The primary problem is with the version of SQL Server driver - Spark 2.4 on Azure Synapse provides version 8.4.1.jre8, whereas spark-mssql-connector:1.0.1
depends on version 7.2.1.jre8. Hence, installing spark-mssql-connector:1.0.1
on Azure Synapse and running the code above yields NoSuchMethodError
when writing batches of data to the database.
Although spark-mssql-connector
has not been released in a couple of months, it is still in active development and proper support for Spark 2.4 on Azure Synapse has been added in March 2021. I built the latest version from source and used the produced jar instead of the one on the Maven repo.