Search code examples
pysparkdatabricksdatabricks-unity-catalog

Java SQL Driver Manager not working in Unit Catalog


This is working in normal databricks workspace but not working in Unity Catalog

driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
con = driver_manager.getConnection(connection_string)

statement = con.prepareCall(query)
statement.execute()

statement.close()
con.close()

Following code throwing an error: py4j.security.Py4JSecurityException: Method public static java.sql.Connection java.sql.DriverManager.getConnection(java.lang.String) throws java.sql.SQLException is not whitelisted on class class java.sql.DriverManager

I have tried to pass seperated arguments for getConnection method, but sounds like this method have changed in Unity Catalog sqlsUrl = f"jdbc:sqlserver://{host}:{port};database={dbname}"

username = 'test'
password = 'test'
con = jvm.java.sql.DriverManager.getConnection(sqlsUrl, username,password)

Solution

  • A cluster with Unity Catalog enabled and Access Mode set to "Shared", has many restrictions to isolate jobs from different users running on same (shared) cluster.

    As they put it: "so that users can’t access unprivileged data through the cluster."

    See

    • Access mode limitations
    • Allowlist -- I do not recommend doing this for spark jars, because it'll defeat the purpose of these restrictions, it's like granting root privileges to every user on a machine.
    • This answer has some more info/links.

    Solution is to simply use a cluster with Access Mode set to "Single User". Anyway your jobs would run as some user, so usually you'll have no reason to have a "Shared" cluster in production.


    PS: Using spark._sc._gateway.jvm.java.sql.DriverManager to create a connection and executing some query is a really weird thing to do in pyspark. You might have chosen the wrong way to do something that you need done.