Search code examples
databricksmlflow

Databricks 11 MLflow error "permission denied" in create_tmp_dir


I'm getting "permission denied" errors when using MLflow 2.1.8 with Databricks runtime 11.3.

It looks like it is trying to write to /tmp, which you can't do in Databricks.

I tried setting MLFLOW_DFS_TMP in the environment, but this seems not to do anything.

It looks like later versions of Databricks like DBR 13 have support for setting the temporary directory, but I'm stuck on DBR 11 for other reasons.


Solution

  • I tracked this down in the MLflow source.

    According to the current source as of 2023-11-30, MLflow is catching all exceptions when looking for the temp directory support on Databricks:

    try:
        return _get_dbutils().entry_point.getReplLocalTempDir()
    except Exception:
        pass
    

    It falls back to using /tmp when the method getReplLocalTempDir is not found on entry_point, throwing an exception.

    I monkey-patched the databricks runtime from my notebook:

    from mlflow.utils.databricks_utils import _get_dbutils
    
    def fake_tmp():
      return '/dbfs/...' # something writable
    
    _get_dbutils().entry_point.getReplLocalTempDir = fake_tmp
    

    This allows mlflow.pyfunc.log_model() to run in the old Databricks runtime. It's not necessary for DBR 13, but I had to run on DBR 11 for reasons.