Search code examples
databricksazure-databricks

Databricks Error: Constructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel(org.apache.spark.SparkContext) is not whitelisted


When I execute the following dbutils code on our cluster display(dbutils.fs.mounts()) I get the following error:

py4j.security.Py4JSecurityException: Constructor public com.databricks.backend.daemon.dbutils.FSUtilsParallel(org.apache.spark.SparkContext) is not whitelisted.

Can someone let me know what could be the cause (and remedy)

The full error message is as follows:

Py4JError                                 Traceback (most recent call last)
<command-1817294295345329> in <module>
----> 1 display(dbutils.fs.mounts())

/local_disk0/tmp/1684505420167-0/dbutils.py in __getattr__(self, item)
    517             jvm_dbutils = sc._jvm.com.databricks.backend.daemon.dbutils
    518             fs = self.FSHandler(jvm_dbutils.FSUtils,
--> 519                                 jvm_dbutils.FSUtilsParallel(sc._jsc.sc()),
    520                                 jvm_dbutils.DBUtilsCore(sc._jsc.sc(),
    521                                                         self.shell.sqlContext._ssql_ctx),

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1567         answer = self._gateway_client.send_command(command)
   1568         return_value = get_return_value(
-> 1569             answer, self._gateway_client, None, self._fqn)
   1570 
   1571         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    125     def deco(*a, **kw):
    126         try:
--> 127             return f(*a, **kw)
    128         except py4j.protocol.Py4JJavaError as e:
    129             converted = convert_exception(e.java_exception)

Any thoughts?

Cluster configuration enter image description here


Solution

  • Seems like this might have something do with Databricks trying to secure things by segregating runtime for each user when you allow multiple users to use same cluster.

    Some combination of Cluster Mode/Access Mode (High Concurrency/Shared/Single/...) and Credentials Passthrough settings causes this.

    If it's dev, I recommend you simply use some different Access Mode (like Single User) and work around all these issues.

    Also read that spark.databricks.pyspark.enablePy4JSecurity false helped some folks. But clearly not something you want to do in prod without understanding what it means.

    See some info:

    --- edit ---

    As described under "Shared access mode limitations": Cannot use Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.

    Your cluster mode is "Custom", which is basically "one of the legacy modes". So limitations of "Shared" mode aren't applicable, but this is the only restriction that says DBUtils isn't allowed.